Software Developer and Writer: January 2021

This content is powered by Balige Publishing. Visit this link (collaboration with Rismon Hasiholan Sianipar) PART 1 PART 2 PART 3 PART 4 PART 5 PART 6 PART 7

In this tutorial, you will learn how to use Pandas, NumPy, Scikit-Learn, and other libraries to perform simple classification using perceptron and Adaline (adaptive linear neuron). The dataset used is Iris dataset directly from the UCI Machine Learning Repository.

Tutorial Steps To Implement K-Nearest Neighbor (KNN) Using Scikit-Learn
Step 1: In this tutorial, you will use the from sklearn.neighbors class as well as the familiar fit() method to train the model on all three classes in the dataset:

#KNN_Scikit_ex.py
from sklearn import datasets
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt

def plot_classifier(X, y, classifier, test_idx=None, resolution=0.01):
    # setup marker generator and color map
    markers = ('s', 'x', 'o', '^', 'v')
    colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
    cmap = ListedColormap(colors[:len(np.unique(y))])
    
    # plot the decision surface
    x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
                           np.arange(x2_min, x2_max, resolution))
    
    Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
    Z = Z.reshape(xx1.shape)
    
    plt.contourf(xx1, xx2, Z, alpha=0.5, cmap=cmap)
    plt.xlim(xx1.min(), xx1.max())
    plt.ylim(xx2.min(), xx2.max())

    for idx, cl in enumerate(np.unique(y)):
        plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1],
                    alpha=0.8, c=colors[idx],
                    marker=markers[idx], label=cl,
                    edgecolor='black')        
    # highlight test samples
    if test_idx:
        # plot all samples
        X_test, y_test = X[test_idx, :], y[test_idx]
        plt.scatter(X_test[:, 0], X_test[:, 1],
                    c='', edgecolor='black', alpha=1.0,
                    linewidth=1, marker='o',
                    s=100, label='test set')
        
#Load data into matrix X and vector y
iris = datasets.load_iris()
X = iris.data[:, [2, 3]]
y = iris.target
print('Class labels:', np.unique(y))
#print(X)

# plot data       
plt.scatter(X[:50, 0], X[:50, 1],
            color='red', marker='o', label='setosa')
plt.scatter(X[50:100, 0], X[50:100, 1],
            color='blue', marker='x', label='versicolor')
plt.scatter(X[100:150, 0], X[100:150, 1],
            color='green', marker='x', label='versicolor')
plt.xlabel('petal length [cm]')
plt.ylabel('petal width [cm]')
plt.legend(loc='upper left')
plt.show()

#Splits the dataset into separate training and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)

#Trains Decision Tree model
knn = KNeighborsClassifier(n_neighbors=5, p=2, metric='minkowski')
knn.fit(X_train, y_train)
            
X_combined = np.vstack((X_train, X_test))
y_combined = np.hstack((y_train, y_test))

#Makes prediction
y_pred = knn.predict(X_test)
print('Misclassified samples: %d' % (y_test != y_pred).sum())

#Calculates classification accuracy 
print('Accuracy: %.2f' % accuracy_score(y_test, y_pred))

plot_classifier(X_combined, y_combined, classifier=knn)
plt.xlabel('petal length')
plt.ylabel('petal width')
plt.legend(loc='upper left')
plt.show()

By specifying five neighbors in the KNN model for this dataset, you obtain a relatively smooth decision boundary.

The right choice of k is crucial to find a good balance between overfitting and underfitting. You also have to make sure that you choose a distance metric that is appropriate for the features in the dataset.

Often, a simple Euclidean distance measure is used for real-value samples, for example, the ﬂowers in our Iris dataset, which have features measured in centimeters. However, if you are using a Euclidean distance measure, it is also important to standardize the data so that each feature contributes equally to the distance. The minkowski distance that you used in the previous code is just a generalization of the Euclidean and Manhattan distance.

Step 2: Run the script to see decision regions as shown in Figure below.

Step 3: Open gui_scikit.ui form that you created before. Modify listAlgorithm widget, so that it has the sixth item as shown in Figure below.

Step 4: Add a new Spin Box widget onto form and set its objectName property as sbNeighbor. Set its minimum, maximum, singleStep, and value properties to 2, 10, 1, and 5.

The newly modified form now looks as shown in Figure below.

Step 5: Add this import statement into Scikit_Classifier.py:

from sklearn.neighbors import KNeighborsClassifier

Step 6: Add this code to the top side of algo_NN() function to disable sbNeighbor widget:

self.sbNeighbor.setEnabled(False)
neighbor = self.sbDepth.value()

Step 7: Add this code to algo_NN() function so that when user choose Nearest Neighbor from listAlgorithm widget, it will perform KNN classification:

if strList == 'Nearest Neighbor':
    self.sbIter.setEnabled(False)  
    self.dsbRate.setEnabled(False)  
    self.sbDepth.setEnabled(False) 
    self.sbNeighbor.setEnabled(True)
            
    #Trains KNN model
    knn = KNeighborsClassifier(n_neighbors=neighbor, p=2, \
        metric='minkowski')
    knn.fit(X_train, y_train)
            
    X_combined = np.vstack((X_train, X_test))
    y_combined = np.hstack((y_train, y_test))
            
    strTitle = 'KNN Classifier with ' + str(ratio*100) + \
        '% Data Ratio '
    strTitle += str(depth) +' Neighbors' 
    self.display_decision(X=X_combined, y=y_combined, \
        classifier=knn, axisWidget=self.widgetDecision.canvas, \
        title=strTitle, test_idx=range(105, 150)) 
                
    #display accuracy graph
    self.graph_knn(self.widgetEpoch.canvas, self.accuracy_knn)

Step 8: Define accuracy_knn() to calculate accuracy of RF model:

def accuracy_knn(self,ratio, neighbor):        
    #Splits the dataset into separate training and test datasets
    X_train, X_test, y_train, y_test = train_test_split(self.X, \
        self.y, test_size=ratio, random_state=1, stratify=self.y)
        
    #Trains KNN model
    knn = KNeighborsClassifier(n_neighbors=neighbor, p=2, \
        metric='minkowski')           
    knn.fit(X_train, y_train)
                            
    #Makes prediction
    y_pred = knn.predict(X_test)
        
    #Calculates classification accuracy 
    acc = round(100*accuracy_score(y_test, y_pred),1)
    return acc

Step 9: Define graph_knn() function to draw accuracy plot of data ratio versus number of neighbors:

def graph_knn(self,axisWidget,func): 
    ratio = self.dsbRatio.value()
    neighbor = self.sbNeighbor.value()
        
    if (ratio+0.4) < 1 :
        rangeDR = [ratio,ratio+0.1,ratio+0.2,ratio+0.3,ratio+0.4]
    else :
       rangeDR = [ratio-0.4,ratio-0.3,ratio-0.2,ratio-0.1,ratio]     

    labels = [str(round(rangeDR[0],2)), str(round(rangeDR[1],2)), \
              str(round(rangeDR[2],2)), str(round(rangeDR[3],2)), \
              str(round(rangeDR[4],2))]
               
    Neighbor1 = []
    for i in rangeDR:
        acc = func(i,neighbor)
        Neighbor1.append(acc)   

    Neighbor2 = []
    for i in rangeDR:
        acc = func(i, neighbor+2)
        Neighbor2.append(acc)  
            
    Neighbor3 = []
    for i in rangeDR:
        acc = func(i, neighbor+3)
        Neighbor3.append(acc)       
            
    x = np.arange(len(labels))  # the label locations
    width = 0.3  # the width of the bars
        
    strLabel1 = 'Neighbor=' + str(round(neighbor, 2))
    strLabel2 = 'Neighbor=' + str(round(neighbor+2, 2))
    strLabel3 = 'Neighbor=' + str(round(neighbor+3, 2))
    axisWidget.axis1.clear()
    rects1 = axisWidget.axis1.bar(x - width/2, Neighbor1, \
        width, label=strLabel1)
    rects2 = axisWidget.axis1.bar(x + width/2, Neighbor2, \
        width, label=strLabel2)
    rects3 = axisWidget.axis1.bar(x + 3*width/2, Neighbor3, \
        width, label=strLabel3)

    # Add some text for labels, title and custom x-axis tick labels, etc.
    axisWidget.axis1.set_ylabel('Accuracy(%)')
    axisWidget.axis1.set_xlabel('Data Ratio (DR)')
    axisWidget.axis1.set_title(\
        'Accuracy by data ratio (DR) and Number of Neighbors')
    axisWidget.axis1.set_xticks(x)
    axisWidget.axis1.set_xticklabels(labels)
    axisWidget.axis1.legend()
        
    self.autolabel(rects1,axisWidget.axis1)
    self.autolabel(rects2,axisWidget.axis1)
    self.autolabel(rects3,axisWidget.axis1)
    axisWidget.draw()

Step 10: Connect valueChanged() of sbNeighbor widget to algo_NN() function and put it inside __init__() method:

self.sbNeighbor.valueChanged.connect(self.algo_NN)

Step 11: Run Scikit_Classifier.py, choose Nearest Neighbor from list widget, and set data ratio 0.3 and neighbor 5 to see the result as shown in Figure below.

Then choose data ratio 0.5 and neighbor 3. The result is shown in Figure below.

Below is the final version of Scikit_Classifier.py:

#Scikit_Classifier.py
from PyQt5.QtWidgets import *
from PyQt5.uic import loadUi
from matplotlib.backends.backend_qt5agg import (NavigationToolbar2QT as NavigationToolbar)
from matplotlib.colors import ListedColormap
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import make_pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
import pandas as pd 

class DemoGUIScikit(QMainWindow):   
    def __init__(self):       
        QMainWindow.__init__(self)
        loadUi("gui_scikit.ui",self)

        self.setWindowTitle("GUI Demo of Classifier Using Scikit-Learn")
        self.addToolBar(NavigationToolbar(self.widgetData.canvas, self))
        self.gbNNParam.setEnabled(False)
        self.listAlgorithm.setEnabled(False)
        self.pbLoad.clicked.connect(self.load_data)
        self.sbIter.valueChanged.connect(self.algo_NN)
        self.dsbRate.valueChanged.connect(self.algo_NN)
        self.dsbRatio.valueChanged.connect(self.algo_NN)    
        self.listAlgorithm.setEnabled(False)
        self.listAlgorithm.clicked.connect(self.algo_NN)
        self.listAlgorithm.setCurrentRow(0)
        self.sbDepth.valueChanged.connect(self.algo_NN)
        self.sbNeighbor.valueChanged.connect(self.algo_NN)
        
    def load_data(self):
        #Load data into matrix X and vector y
        iris = datasets.load_iris()
        self.X = iris.data[:, [2, 3]]
        self.y = iris.target   
        self.display_data(self.X, self.widgetData.canvas)
        
        self.gbNNParam.setEnabled(True)
        self.pbLoad.setEnabled(False)  
        self.listAlgorithm.setEnabled(True)               

    def display_data(self,X,axisWidget):             
        # plot data
        axisWidget.axis1.clear()
        axisWidget.axis1.scatter(X[:50, 0], X[:50, 1],
            color='red', marker='o', label='setosa')
        
        axisWidget.axis1.scatter(X[50:100, 0], X[50:100, 1],
            color='blue', marker='x', label='versicolor')
        
        axisWidget.axis1.scatter(X[100:150, 0], X[100:150, 1],
            color='green', marker='x', label='virginica')
        
        axisWidget.axis1.set_xlabel('Petal length [cm]')
        axisWidget.axis1.set_ylabel('petal Width [cm]')
        axisWidget.axis1.legend(loc='upper left')
        
        title = 'Petal length and Petal width [cm]'
        axisWidget.axis1.set_title(title)
        axisWidget.draw()
                
        #displays data on table widget
        self.display_table()

        #Displays decision regions       
        self.algo_NN()        

    def display_table(self):
        data = datasets.load_iris()
        df = pd.DataFrame(np.column_stack((data.data, data.target)), columns = data.feature_names+['target'])
        df['label'] = df.target.replace(dict(enumerate(data.target_names)))

        # show data on table widget
        self.write_df_to_qtable(df,self.tableData)
        self.tableData.setHorizontalHeaderLabels(data.feature_names)
        
        styleH = "::section {""background-color: cyan; }"
        self.tableData.horizontalHeader().setStyleSheet(styleH)

        styleV = "::section {""background-color: red; }"
        self.tableData.verticalHeader().setStyleSheet(styleV)  

    # Takes a df and writes it to a qtable provided. df headers become qtable headers
    @staticmethod
    def write_df_to_qtable(df,table):
        table.setRowCount(df.shape[0])
        table.setColumnCount(df.shape[1])       

        # getting data from df is computationally costly so convert it to array first
        df_array = df.values
        for row in range(df.shape[0]):
            for col in range(df.shape[1]):
                table.setItem(row, col, QTableWidgetItem(str(df_array[row,col])))


    def display_decision(self, X, y, classifier, axisWidget, title, test_idx=None, resolution=0.01):
        # setup marker generator and color map
        markers = ('s', 'x', 'o', '^', 'v')
        colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
        cmap = ListedColormap(colors[:len(np.unique(y))])
    
        # plot the decision surface
        x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
        x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
        xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
                           np.arange(x2_min, x2_max, resolution))
    
        Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
        Z = Z.reshape(xx1.shape)
        axisWidget.axis1.clear()
        axisWidget.axis1.contourf(xx1, xx2, Z, alpha=0.5, cmap=cmap)
        axisWidget.axis1.set_xlim(xx1.min(), xx1.max())
        axisWidget.axis1.set_ylim(xx2.min(), xx2.max())

        for idx, cl in enumerate(np.unique(y)):
            axisWidget.axis1.scatter(x=X[y == cl, 0], y=X[y == cl, 1],
                    alpha=0.8, c=colors[idx],
                    marker=markers[idx], label=cl,
                    edgecolor='black')
        
        # highlight test samples
        if test_idx:
            # plot all samples
            X_test, y_test = X[test_idx, :], y[test_idx]
            axisWidget.axis1.scatter(X_test[:, 0], X_test[:, 1],
                    c='', edgecolor='black', alpha=1.0,
                    linewidth=1, marker='o',
                    s=100, label='test set')
        
        axisWidget.axis1.set_xlabel('petal length [standardized]')
        axisWidget.axis1.set_ylabel('petal width [standardized]')
        axisWidget.axis1.set_label('petal width [standardized]')
        axisWidget.axis1.legend(loc='upper left')
        axisWidget.axis1.set_title(title)
        axisWidget.draw()

    def algo_NN(self): 
        self.sbIter.setEnabled(True)  
        self.dsbRate.setEnabled(True)  
        self.sbDepth.setEnabled(False) 
        self.sbNeighbor.setEnabled(False)
        
        iterNum = self.sbIter.value()
        ratio = self.dsbRatio.value()
        self.dsbRate.setDecimals(5)
        learningRate = self.dsbRate.value()
        depth = self.sbDepth.value()
        neighbor = self.sbDepth.value()
        
        #Splits the dataset into separate training and test datasets
        X_train, X_test, y_train, y_test = train_test_split(self.X, self.y, test_size=ratio, random_state=1, stratify=self.y)
        
        #standardizes the features using the StandardScaler 
        sc = StandardScaler()
        sc.fit(X_train)
        X_train_std = sc.transform(X_train)
        X_test_std = sc.transform(X_test)

        item = self.listAlgorithm.currentItem()
        strList = item.text()
        
        if strList == 'Perceptron':
            #Trains perceptron
            ppn = Perceptron(max_iter=iterNum, eta0=learningRate, random_state=1)
            ppn.fit(X_train_std, y_train)
        
            X_combined_std = np.vstack((X_train_std, X_test_std))
            y_combined = np.hstack((y_train, y_test))
        
            strTitle = 'Perceptron Classifier with ' + str(ratio*100) + '% Data Ratio '
            strTitle += ' and Learning Rate ' +str(learningRate)
        
            self.display_decision(X=X_combined_std, y=y_combined, classifier=ppn, \
                              axisWidget=self.widgetDecision.canvas, \
                              title=strTitle, test_idx=range(105, 150))
        
            #display graph
            self.graph(self.widgetEpoch.canvas, self.accuracy_perceptron)

        if strList == 'Logistic Regression':
            #Trains logistic regression model
            lgr = make_pipeline(StandardScaler(), SGDClassifier('log',max_iter=iterNum,eta0=learningRate, tol=1e-3))
            #lgr = LogisticRegression(C=100.0, random_state=1)
            lgr.fit(X_train_std, y_train)
            
            X_combined_std = np.vstack((X_train_std, X_test_std))
            y_combined = np.hstack((y_train, y_test))
            
            strTitle = 'Logistic Regression Classifier with ' + str(ratio*100) + '% Data Ratio '
        
            self.display_decision(X=X_combined_std, y=y_combined, classifier=lgr, \
                              axisWidget=self.widgetDecision.canvas, \
                              title=strTitle, test_idx=range(105, 150))
                
            #display graph
            self.graph(self.widgetEpoch.canvas, self.accuracy_logistic)

        if strList == 'Support Vector Machine (SVM)':
            #Trains SVM model
            svm = make_pipeline(StandardScaler(), SGDClassifier('hinge',max_iter=iterNum,eta0=learningRate, tol=1e-3))
            svm.fit(X_train_std, y_train)
            
            X_combined_std = np.vstack((X_train_std, X_test_std))
            y_combined = np.hstack((y_train, y_test))
            
            strTitle = 'SVM Classifier with ' + str(ratio*100) + '% Data Ratio '
        
            self.display_decision(X=X_combined_std, y=y_combined, classifier=svm, \
                              axisWidget=self.widgetDecision.canvas, \
                              title=strTitle, test_idx=range(105, 150))
                
            #display graph
            self.graph(self.widgetEpoch.canvas, self.accuracy_svm)
        
        if strList == 'Decision Tree':
            self.sbIter.setEnabled(False)  
            self.dsbRate.setEnabled(False)  
            self.sbDepth.setEnabled(True) 
            
            #Trains Decision Tree model
            tree = DecisionTreeClassifier(criterion='gini', max_depth=depth,random_state=1)            
            tree.fit(X_train, y_train)
            
            X_combined = np.vstack((X_train, X_test))
            y_combined = np.hstack((y_train, y_test))
            
            strTitle = 'Decision Tree Classifier with ' + str(ratio*100) + '% Data Ratio '
            strTitle += ' and Max Depth ' +str(depth)
            self.display_decision(X=X_combined, y=y_combined, classifier=tree, \
                              axisWidget=self.widgetDecision.canvas, \
                              title=strTitle, test_idx=range(105, 150)) 
                
            #display accuracy graph
            self.graph_dt(self.widgetEpoch.canvas, self.accuracy_dt)

        if strList == 'Random Forest':
            self.sbIter.setEnabled(False)  
            self.dsbRate.setEnabled(False)  
            self.sbDepth.setEnabled(True) 
            
            #Trains Random Forest model
            forest = RandomForestClassifier(criterion='gini', n_estimators=25,max_depth=depth,random_state=1)          
            forest.fit(X_train, y_train)
            
            X_combined = np.vstack((X_train, X_test))
            y_combined = np.hstack((y_train, y_test))
            
            strTitle = 'Random Forest Classifier with ' + str(ratio*100) + '% Data Ratio '
            strTitle += ' and Max Depth ' +str(depth)
            self.display_decision(X=X_combined, y=y_combined, classifier=forest, \
                              axisWidget=self.widgetDecision.canvas, \
                              title=strTitle, test_idx=range(105, 150)) 
                
            #display accuracy graph
            self.graph_dt(self.widgetEpoch.canvas, self.accuracy_rf)

        if strList == 'Nearest Neighbor':
            self.sbIter.setEnabled(False)  
            self.dsbRate.setEnabled(False)  
            self.sbDepth.setEnabled(False) 
            self.sbNeighbor.setEnabled(True)
            
            #Trains KNN model
            knn = KNeighborsClassifier(n_neighbors=neighbor, p=2, metric='minkowski')
            knn.fit(X_train, y_train)
            
            X_combined = np.vstack((X_train, X_test))
            y_combined = np.hstack((y_train, y_test))
            
            strTitle = 'KNN Classifier with ' + str(ratio*100) + '% Data Ratio '
            strTitle += str(depth) +' Neighbors' 
            self.display_decision(X=X_combined, y=y_combined, classifier=knn, \
                              axisWidget=self.widgetDecision.canvas, \
                              title=strTitle, test_idx=range(105, 150)) 
                
            #display accuracy graph
            self.graph_knn(self.widgetEpoch.canvas, self.accuracy_knn)
            
    def accuracy_perceptron(self,ratio,learningRate):        
        #Splits the dataset into separate training and test datasets
        X_train, X_test, y_train, y_test = train_test_split(self.X, self.y, test_size=ratio, random_state=1, stratify=self.y)
        
        #standardizes the features using the StandardScaler 
        sc = StandardScaler()
        sc.fit(X_train)
        X_train_std = sc.transform(X_train)
        X_test_std = sc.transform(X_test)
        
        #Trains perceptron
        ppn = Perceptron(max_iter=100, eta0=learningRate, random_state=1)
        ppn.fit(X_train_std, y_train)
                
        #Makes prediction
        y_pred = ppn.predict(X_test_std)

        #Calculates classification accuracy 
        acc = round(100*accuracy_score(y_test, y_pred),1)
        return acc

    def accuracy_logistic(self,ratio,learningRate):        
        #Splits the dataset into separate training and test datasets
        X_train, X_test, y_train, y_test = train_test_split(self.X, self.y, test_size=ratio, random_state=1, stratify=self.y)
        
        #standardizes the features using the StandardScaler 
        sc = StandardScaler()
        sc.fit(X_train)
        X_train_std = sc.transform(X_train)
        X_test_std = sc.transform(X_test)
        
        #Trains logistic regression model
        lgr = make_pipeline(StandardScaler(), SGDClassifier('log',max_iter=1000,eta0=learningRate, tol=1e-3))
        lgr.fit(X_train_std, y_train)
                
        #Makes prediction
        y_pred = lgr.predict(X_test_std)

        #Calculates classification accuracy 
        acc = round(100*accuracy_score(y_test, y_pred),1)
        return acc          

    def accuracy_svm(self,ratio,learningRate):        
        #Splits the dataset into separate training and test datasets
        X_train, X_test, y_train, y_test = train_test_split(self.X, self.y, test_size=ratio, random_state=1, stratify=self.y)
        
        #standardizes the features using the StandardScaler 
        sc = StandardScaler()
        sc.fit(X_train)
        X_train_std = sc.transform(X_train)
        X_test_std = sc.transform(X_test)
        
        #Trains logistic regression model
        svm = make_pipeline(StandardScaler(), SGDClassifier('hinge',max_iter=1000,eta0=learningRate, tol=1e-3))
        svm.fit(X_train_std, y_train)
                
        #Makes prediction
        y_pred = svm.predict(X_test_std)

        #Calculates classification accuracy 
        acc = round(100*accuracy_score(y_test, y_pred),1)
        return acc   

    def accuracy_dt(self,ratio,depth):        
        #Splits the dataset into separate training and test datasets
        X_train, X_test, y_train, y_test = train_test_split(self.X, self.y, test_size=ratio, random_state=1, stratify=self.y)
        
        #Trains Decision Tree model
        tree = DecisionTreeClassifier(criterion='gini', max_depth=depth,random_state=1)            
        tree.fit(X_train, y_train)
                            
        #Makes prediction
        y_pred = tree.predict(X_test)
        
        #Calculates classification accuracy 
        acc = round(100*accuracy_score(y_test, y_pred),1)
        return acc   

    def accuracy_rf(self,ratio,depth):        
        #Splits the dataset into separate training and test datasets
        X_train, X_test, y_train, y_test = train_test_split(self.X, self.y, test_size=ratio, random_state=1, stratify=self.y)
        
        #Trains Random Forest model
        forest = RandomForestClassifier(criterion='gini', n_estimators=25,max_depth=depth,random_state=1)           
        forest.fit(X_train, y_train)
                            
        #Makes prediction
        y_pred = forest.predict(X_test)
        
        #Calculates classification accuracy 
        acc = round(100*accuracy_score(y_test, y_pred),1)
        return acc 

    def accuracy_knn(self,ratio,neighbor):        
        #Splits the dataset into separate training and test datasets
        X_train, X_test, y_train, y_test = train_test_split(self.X, self.y, test_size=ratio, random_state=1, stratify=self.y)
        
        #Trains KNN model
        knn = KNeighborsClassifier(n_neighbors=neighbor, p=2, metric='minkowski')           
        knn.fit(X_train, y_train)
                            
        #Makes prediction
        y_pred = knn.predict(X_test)
        
        #Calculates classification accuracy 
        acc = round(100*accuracy_score(y_test, y_pred),1)
        return acc 
            
    def graph(self,axisWidget,func): 
        ratio = self.dsbRatio.value()
        learningRate = self.dsbRate.value()
        
        if (ratio+0.4) < 1 :
            rangeDR = [ratio,ratio+0.1,ratio+0.2,ratio+0.3,ratio+0.4]
        else :
           rangeDR = [ratio-0.4,ratio-0.3,ratio-0.2,ratio-0.1,ratio]     

        labels = [str(round(rangeDR[0],2)), str(round(rangeDR[1],2)), \
                  str(round(rangeDR[2],2)), str(round(rangeDR[3],2)), \
                  str(round(rangeDR[4],2))]
               
        LR01 = []
        for i in rangeDR:
            acc = func(i,learningRate)
            LR01.append(acc)   

        LR001 = []
        for i in rangeDR:
            acc = func(i,learningRate+0.1)
            LR001.append(acc)  
            
        LR0001 = []
        for i in rangeDR:
            acc = func(i,learningRate+0.25)
            LR0001.append(acc)       
            
        x = np.arange(len(labels))  # the label locations
        width = 0.3  # the width of the bars
        
        strLabel1 = 'LR=' + str(round(learningRate, 2))
        strLabel2 = 'LR=' + str(round(learningRate+0.1, 2))
        strLabel3 = 'LR=' + str(round(learningRate+0.25, 2))
        axisWidget.axis1.clear()
        rects1 = axisWidget.axis1.bar(x - width/2, LR01, width, label=strLabel1)
        rects2 = axisWidget.axis1.bar(x + width/2, LR001, width, label=strLabel2)
        rects3 = axisWidget.axis1.bar(x + 3*width/2, LR0001, width, label=strLabel3)

        # Add some text for labels, title and custom x-axis tick labels, etc.
        axisWidget.axis1.set_ylabel('Accuracy(%)')
        axisWidget.axis1.set_xlabel('Data Ratio (DR)')
        axisWidget.axis1.set_title('Accuracy by data ratio (DR) and learning rate (LR)')
        axisWidget.axis1.set_xticks(x)
        axisWidget.axis1.set_xticklabels(labels)
        axisWidget.axis1.legend()
        #axisWidget.axis1.set_facecolor('xkcd:banana')
        
        self.autolabel(rects1,axisWidget.axis1)
        self.autolabel(rects2,axisWidget.axis1)
        self.autolabel(rects3,axisWidget.axis1)
        axisWidget.draw()

    def autolabel(self,rects,axisWidget):
        """Attach a text label above each bar in *rects*, displaying its height."""
        for rect in rects:
            height = rect.get_height()
            axisWidget.annotate('{}'.format(height),
                    xy=(rect.get_x() + rect.get_width() / 2, height),
                    xytext=(0, 3),  # 3 points vertical offset
                    textcoords="offset points",
                    ha='center', va='bottom')


    def graph_dt(self,axisWidget,func): 
        ratio = self.dsbRatio.value()
        depth = self.sbDepth.value()
        
        if (ratio+0.4) < 1 :
            rangeDR = [ratio,ratio+0.1,ratio+0.2,ratio+0.3,ratio+0.4]
        else :
           rangeDR = [ratio-0.4,ratio-0.3,ratio-0.2,ratio-0.1,ratio]     

        labels = [str(round(rangeDR[0],2)), str(round(rangeDR[1],2)), \
                  str(round(rangeDR[2],2)), str(round(rangeDR[3],2)), \
                  str(round(rangeDR[4],2))]
               
        Depth1 = []
        for i in rangeDR:
            acc = func(i,depth)
            Depth1.append(acc)   

        Depth2 = []
        for i in rangeDR:
            acc = func(i,depth+4)
            Depth2.append(acc)  
            
        Depth3 = []
        for i in rangeDR:
            acc = func(i,depth+4)
            Depth3.append(acc)       
            
        x = np.arange(len(labels))  # the label locations
        width = 0.3  # the width of the bars
        
        strLabel1 = 'Depth=' + str(round(depth, 2))
        strLabel2 = 'Depth=' + str(round(depth+2, 2))
        strLabel3 = 'Depth=' + str(round(depth+4, 2))
        axisWidget.axis1.clear()
        rects1 = axisWidget.axis1.bar(x - width/2, Depth1, width, label=strLabel1)
        rects2 = axisWidget.axis1.bar(x + width/2, Depth2, width, label=strLabel2)
        rects3 = axisWidget.axis1.bar(x + 3*width/2, Depth3, width, label=strLabel3)

        # Add some text for labels, title and custom x-axis tick labels, etc.
        axisWidget.axis1.set_ylabel('Accuracy(%)')
        axisWidget.axis1.set_xlabel('Data Ratio (DR)')
        axisWidget.axis1.set_title('Accuracy by data ratio (DR) and Depth')
        axisWidget.axis1.set_xticks(x)
        axisWidget.axis1.set_xticklabels(labels)
        axisWidget.axis1.legend()
        #axisWidget.axis1.set_facecolor('xkcd:banana')
        
        self.autolabel(rects1,axisWidget.axis1)
        self.autolabel(rects2,axisWidget.axis1)
        self.autolabel(rects3,axisWidget.axis1)
        axisWidget.draw()

    def graph_knn(self,axisWidget,func): 
        ratio = self.dsbRatio.value()
        neighbor = self.sbNeighbor.value()
        
        if (ratio+0.4) < 1 :
            rangeDR = [ratio,ratio+0.1,ratio+0.2,ratio+0.3,ratio+0.4]
        else :
           rangeDR = [ratio-0.4,ratio-0.3,ratio-0.2,ratio-0.1,ratio]     

        labels = [str(round(rangeDR[0],2)), str(round(rangeDR[1],2)), \
                  str(round(rangeDR[2],2)), str(round(rangeDR[3],2)), \
                  str(round(rangeDR[4],2))]
               
        Neighbor1 = []
        for i in rangeDR:
            acc = func(i,neighbor)
            Neighbor1.append(acc)   

        Neighbor2 = []
        for i in rangeDR:
            acc = func(i,neighbor+2)
            Neighbor2.append(acc)  
            
        Neighbor3 = []
        for i in rangeDR:
            acc = func(i,neighbor+3)
            Neighbor3.append(acc)       
            
        x = np.arange(len(labels))  # the label locations
        width = 0.3  # the width of the bars
        
        strLabel1 = 'Neighbor=' + str(round(neighbor, 2))
        strLabel2 = 'Neighbor=' + str(round(neighbor+2, 2))
        strLabel3 = 'Neighbor=' + str(round(neighbor+3, 2))
        axisWidget.axis1.clear()
        rects1 = axisWidget.axis1.bar(x - width/2, Neighbor1, width, label=strLabel1)
        rects2 = axisWidget.axis1.bar(x + width/2, Neighbor2, width, label=strLabel2)
        rects3 = axisWidget.axis1.bar(x + 3*width/2, Neighbor3, width, label=strLabel3)

        # Add some text for labels, title and custom x-axis tick labels, etc.
        axisWidget.axis1.set_ylabel('Accuracy(%)')
        axisWidget.axis1.set_xlabel('Data Ratio (DR)')
        axisWidget.axis1.set_title('Accuracy by data ratio (DR) and Number of Neighbors')
        axisWidget.axis1.set_xticks(x)
        axisWidget.axis1.set_xticklabels(labels)
        axisWidget.axis1.legend()
        
        self.autolabel(rects1,axisWidget.axis1)
        self.autolabel(rects2,axisWidget.axis1)
        self.autolabel(rects3,axisWidget.axis1)
        axisWidget.draw()
              
if __name__ == '__main__':
    import sys
    app = QApplication(sys.argv)
    ex = DemoGUIScikit()
    ex.show()
    sys.exit(app.exec_())

Learn From Scratch Neural Networks Using PyQt: Part 9

Software Developer and Writer

Tuesday, January 26, 2021

Learn From Scratch Neural Networks Using PyQt: Part 8

© Copyright (2017),VIVIAN SIAHAAN,All Rights Reserved.

Official Blog | Kontak Kami

Blog Design By VIVIAN SIAHAAN

Content Design By VIVIAN SIAHAAN