網(wǎng)上有很多關(guān)于具有手寫筆的pos機(jī)的制作方法,高斯樸素貝葉斯分類的原理解釋和手寫代碼實(shí)現(xiàn)的知識(shí),也有很多人為大家解答關(guān)于具有手寫筆的pos機(jī)的制作方法的問(wèn)題,今天pos機(jī)之家(m.dsth100338.com)為大家整理了關(guān)于這方面的知識(shí),讓我們一起來(lái)看下吧!
本文目錄一覽:
具有手寫筆的pos機(jī)的制作方法
高斯樸素貝葉斯 (GNB) 是一種基于概率方法和高斯分布的機(jī)器學(xué)習(xí)的分類技術(shù)。 高斯樸素貝葉斯假設(shè)每個(gè)參數(shù)(也稱為特征或預(yù)測(cè)變量)具有預(yù)測(cè)輸出變量的獨(dú)立能力。 所有參數(shù)的預(yù)測(cè)組合是最終預(yù)測(cè),它返回因變量被分類到每個(gè)組中的概率,最后的分類被分配給概率較高的分組(類)。
什么是高斯分布?高斯分布也稱為正態(tài)分布,是描述自然界中連續(xù)隨機(jī)變量的統(tǒng)計(jì)分布的統(tǒng)計(jì)模型。 正態(tài)分布由其鐘形曲線定義, 正態(tài)分布中兩個(gè)最重要的特征是均值 (μ) 和標(biāo)準(zhǔn)差 (σ)。 平均值是分布的平均值,標(biāo)準(zhǔn)差是分布在平均值周圍的“寬度”。
重要的是要知道正態(tài)分布的變量 (X) 從 -∞ < X < +∞ 連續(xù)分布(連續(xù)變量),并且模型曲線下的總面積為 1。
多分類的高斯樸素貝葉斯導(dǎo)入必要的庫(kù):
from random import randomfrom random import randintimport pandas as pdimport numpy as npimport seaborn as snsimport matplotlib.pyplot as pltimport statisticsfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScalerfrom sklearn.naive_bayes import GaussianNBfrom sklearn.metrics import confusion_matrixfrom mlxtend.plotting import plot_decision_regions
現(xiàn)在創(chuàng)建一個(gè)預(yù)測(cè)變量呈正態(tài)分布的數(shù)據(jù)集。
#Creating values for FeNO with 3 classes:FeNO_0 = np.random.normal(20, 19, 200)FeNO_1 = np.random.normal(40, 20, 200)FeNO_2 = np.random.normal(60, 20, 200)#Creating values for FEV1 with 3 classes:FEV1_0 = np.random.normal(4.65, 1, 200)FEV1_1 = np.random.normal(3.75, 1.2, 200)FEV1_2 = np.random.normal(2.85, 1.2, 200)#Creating values for Broncho Dilation with 3 classes:BD_0 = np.random.normal(150,49, 200)BD_1 = np.random.normal(201,50, 200)BD_2 = np.random.normal(251, 50, 200)#Creating labels variable with three classes:(2)disease (1)possible disease (0)no disease:not_asthma = np.zeros((200,), dtype=int)poss_asthma = np.ones((200,), dtype=int)asthma = np.full((200,), 2, dtype=int)#Concatenate classes into one variable:FeNO = np.concatenate([FeNO_0, FeNO_1, FeNO_2])FEV1 = np.concatenate([FEV1_0, FEV1_1, FEV1_2])BD = np.concatenate([BD_0, BD_1, BD_2])dx = np.concatenate([not_asthma, poss_asthma, asthma])#Create DataFrame:df = pd.DataFrame()#Add variables to DataFrame:df['FeNO'] = FeNO.tolist()df['FEV1'] = FEV1.tolist()df['BD'] = BD.tolist()df['dx'] = dx.tolist()#Check database:df
我們的df有 600 行和 4 列。 現(xiàn)在我們可以通過(guò)可視化檢查變量的分布:
fig, axs = plt.subplots(2, 3, figsize=(14, 7))sns.kdeplot(df['FEV1'], shade=True, color="b", ax=axs[0, 0])sns.kdeplot(df['FeNO'], shade=True, color="b", ax=axs[0, 1])sns.kdeplot(df['BD'], shade=True, color="b", ax=axs[0, 2])sns.distplot( a=df["FEV1"], hist=True, kde=True, rug=False, ax=axs[1, 0])sns.distplot( a=df["FeNO"], hist=True, kde=True, rug=False, ax=axs[1, 1])sns.distplot( a=df["BD"], hist=True, kde=True, rug=False, ax=axs[1, 2])plt.show()
通過(guò)人肉的檢查,數(shù)據(jù)似乎接近高斯分布。 還可以使用 qq-plots仔細(xì)檢查:
from statsmodels.graphics.gofplots import qqplotfrom matplotlib import pyplot#q-q plot:fig, axs = pyplot.subplots(1, 3, figsize=(15, 5))qqplot(df['FEV1'], line='s', ax=axs[0])qqplot(df['FeNO'], line='s', ax=axs[1])qqplot(df['BD'], line='s', ax=axs[2])pyplot.show()
雖然不是完美的正態(tài)分布,但已經(jīng)很接近了。下面查看的數(shù)據(jù)集和變量之間的相關(guān)性:
#Exploring dataset:sns.pairplot(df, kind="scatter", hue="dx")plt.show()
可以使用框線圖檢查這三組的分布,看看哪些特征可以更好的區(qū)分出類別
# plotting both distibutions on the same figurefig, axs = plt.subplots(2, 3, figsize=(14, 7))fig = sns.kdeplot(df['FEV1'], hue= df['dx'], shade=True, color="r", ax=axs[0, 0])fig = sns.kdeplot(df['FeNO'], hue= df['dx'], shade=True, color="r", ax=axs[0, 1])fig = sns.kdeplot(df['BD'], hue= df['dx'], shade=True, color="r", ax=axs[0, 2])sns.boxplot(x=df["dx"], y=df["FEV1"], palette = 'magma', ax=axs[1, 0])sns.boxplot(x=df["dx"], y=df["FeNO"], palette = 'magma',ax=axs[1, 1])sns.boxplot(x=df["dx"], y=df["BD"], palette = 'magma',ax=axs[1, 2])plt.show()手寫樸素貝葉斯分類
手寫代碼并不是讓我們重復(fù)的制造輪子,而是通過(guò)自己編寫代碼對(duì)算法更好的理解。在進(jìn)行貝葉斯分類之前,先要了解正態(tài)分布。
正態(tài)分布的數(shù)學(xué)公式定義了一個(gè)觀測(cè)值出現(xiàn)在某個(gè)群體中的概率:
我們可以創(chuàng)建一個(gè)函數(shù)來(lái)計(jì)算這個(gè)概率:
def normal_dist(x , mean , sd):prob_density = (1/sd*np.sqrt(2*np.pi)) * np.exp(-0.5*((x-mean)/sd)**2)return prob_density
知道正態(tài)分布公式,就可以計(jì)算該樣本在三個(gè)分組(分類)概率。 首先,需要計(jì)算所有預(yù)測(cè)特征和組的均值和標(biāo)準(zhǔn)差:
#Group 0:group_0 = df[df['dx'] == 0]print('Mean FEV1 group 0: ', statistics.mean(group_0['FEV1']))print('SD FEV1 group 0: ', statistics.stdev(group_0['FEV1']))print('Mean FeNO group 0: ', statistics.mean(group_0['FeNO']))print('SD FeNO group 0: ', statistics.stdev(group_0['FeNO']))print('Mean BD group 0: ', statistics.mean(group_0['BD']))print('SD BD group 0: ', statistics.stdev(group_0['BD']))#Group 1:group_1 = df[df['dx'] == 1]print('Mean FEV1 group 1: ', statistics.mean(group_1['FEV1']))print('SD FEV1 group 1: ', statistics.stdev(group_1['FEV1']))print('Mean FeNO group 1: ', statistics.mean(group_1['FeNO']))print('SD FeNO group 1: ', statistics.stdev(group_1['FeNO']))print('Mean BD group 1: ', statistics.mean(group_1['BD']))print('SD BD group 1: ', statistics.stdev(group_1['BD']))#Group 2:group_2 = df[df['dx'] == 2]print('Mean FEV1 group 2: ', statistics.mean(group_2['FEV1']))print('SD FEV1 group 2: ', statistics.stdev(group_2['FEV1']))print('Mean FeNO group 2: ', statistics.mean(group_2['FeNO']))print('SD FeNO group 2: ', statistics.stdev(group_2['FeNO']))print('Mean BD group 2: ', statistics.mean(group_2['BD']))print('SD BD group 2: ', statistics.stdev(group_2['BD']))
現(xiàn)在,使用一個(gè)隨機(jī)的樣本進(jìn)行測(cè)試:FEV1 = 2.75FeNO = 27BD = 125
#Probability for:#FEV1 = 2.75#FeNO = 27#BD = 125#We have the same number of observations, so the general probability is: 0.33Prob_geral = round(0.333, 3)#Prob FEV1:Prob_FEV1_0 = round(normal_dist(2.75, 4.70, 1.08), 10)print('Prob FEV1 0: ', Prob_FEV1_0)Prob_FEV1_1 = round(normal_dist(2.75, 3.70, 1.13), 10)print('Prob FEV1 1: ', Prob_FEV1_1)Prob_FEV1_2 = round(normal_dist(2.75, 3.01, 1.22), 10)print('Prob FEV1 2: ', Prob_FEV1_2)#Prob FeNO:Prob_FeNO_0 = round(normal_dist(27, 19.71, 19.29), 10)print('Prob FeNO 0: ', Prob_FeNO_0)Prob_FeNO_1 = round(normal_dist(27, 42.34, 19.85), 10)print('Prob FeNO 1: ', Prob_FeNO_1)Prob_FeNO_2 = round(normal_dist(27, 61.78, 21.39), 10)print('Prob FeNO 2: ', Prob_FeNO_2)#Prob BD:Prob_BD_0 = round(normal_dist(125, 152.59, 50.33), 10)print('Prob BD 0: ', Prob_BD_0)Prob_BD_1 = round(normal_dist(125, 199.14, 50.81), 10)print('Prob BD 1: ', Prob_BD_1)Prob_BD_2 = round(normal_dist(125, 256.13, 47.04), 10)print('Prob BD 2: ', Prob_BD_2)#Compute probability:Prob_group_0 = Prob_geral*Prob_FEV1_0*Prob_FeNO_0*Prob_BD_0print('Prob group 0: ', Prob_group_0)Prob_group_1 = Prob_geral*Prob_FEV1_1*Prob_FeNO_1*Prob_BD_1print('Prob group 1: ', Prob_group_1)Prob_group_2 = Prob_geral*Prob_FEV1_2*Prob_FeNO_2*Prob_BD_2print('Prob group 2: ', Prob_group_2)
可以看到,這個(gè)樣本具有屬于第 2 組的概率最高。這就是樸素貝葉斯手動(dòng)計(jì)算的的流程,但是這種成熟的算法可以使用來(lái)自 Scikit-Learn 的更高效的實(shí)現(xiàn)。
Scikit-Learn的分類器樣例Scikit-Learn的GaussianNB為我們提供了更加高效的方法,下面我們使用GaussianNB進(jìn)行完整的分類實(shí)例。首先創(chuàng)建 X 和 y 變量,并執(zhí)行訓(xùn)練和測(cè)試拆分:
#Creating X and y:X = df.drop('dx', axis=1)y = df['dx']#Data split into train and test:X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)
在輸入之前還需要使用 standardscaler 對(duì)數(shù)據(jù)進(jìn)行標(biāo)準(zhǔn)化:
sc = StandardScaler()X_train = sc.fit_transform(X_train)X_test = sc.transform(X_test)
現(xiàn)在構(gòu)建和評(píng)估模型:
#Build the model:classifier = GaussianNB()classifier.fit(X_train, y_train)#Evaluate the model:print("training set score: %f" % classifier.score(X_train, y_train))print("test set score: %f" % classifier.score(X_test, y_test))
下面使用混淆矩陣來(lái)可視化結(jié)果:
# Predicting the Test set resultsy_pred = classifier.predict(X_test)#Confusion Matrix:cm = confusion_matrix(y_test, y_pred)print(cm)
通過(guò)混淆矩陣可以看到,的模型最適合預(yù)測(cè)類別 0,但類別 1 和 2 的錯(cuò)誤率很高。為了查看這個(gè)問(wèn)題,我們使用變量構(gòu)建決策邊界圖:
df.to_csv('data.csv', index = False)data = pd.read_csv('data.csv')def gaussian_nb_a(data):x = data[['BD','FeNO',]].valuesy = data['dx'].astype(int).valuesGauss_nb = GaussianNB()Gauss_nb.fit(x,y)print(Gauss_nb.score(x,y))#Plot decision region:plot_decision_regions(x,y, clf=Gauss_nb, legend=1)#Adding axes annotations:plt.xlabel('X_train')plt.ylabel('y_train')plt.title('Gaussian Naive Bayes')plt.show()def gaussian_nb_b(data):x = data[['BD','FEV1',]].valuesy = data['dx'].astype(int).values Gauss_nb = GaussianNB()Gauss_nb.fit(x,y)print(Gauss_nb.score(x,y))#Plot decision region:plot_decision_regions(x,y, clf=Gauss_nb, legend=1)#Adding axes annotations:plt.xlabel('X_train')plt.ylabel('y_train')plt.title('Gaussian Naive Bayes') plt.show()def gaussian_nb_c(data):x = data[['FEV1','FeNO',]].valuesy = data['dx'].astype(int).valuesGauss_nb = GaussianNB()Gauss_nb.fit(x,y)print(Gauss_nb.score(x,y))#Plot decision region:plot_decision_regions(x,y, clf=Gauss_nb, legend=1)#Adding axes annotations: plt.xlabel('X_train')plt.ylabel('y_train') plt.title('Gaussian Naive Bayes')plt.show()gaussian_nb_a(data)gaussian_nb_b(data)gaussian_nb_c(data)
通過(guò)決策邊界我們可以觀察到分類錯(cuò)誤的原因,從圖中我們看到,很多點(diǎn)都是落在決策邊界之外的,如果是實(shí)際數(shù)據(jù)我們需要分析具體原因,但是因?yàn)槭菧y(cè)試數(shù)據(jù)所以我們也不需要更多的分析。
作者:Carla Martins
以上就是關(guān)于具有手寫筆的pos機(jī)的制作方法,高斯樸素貝葉斯分類的原理解釋和手寫代碼實(shí)現(xiàn)的知識(shí),后面我們會(huì)繼續(xù)為大家整理關(guān)于具有手寫筆的pos機(jī)的制作方法的知識(shí),希望能夠幫助到大家!
