Support Vector Machine (SVM) is one of the supervised Machine Learning Algorithm.That help to solve classification as well as regression problem.An SVM separate class based on the hyperplanes line and also set two margin line from the hyperplanes line that help to separate class easily.Have a possibility of many hyperplane lines that separates the class, but the select hyperplane line which help to maximize marginal distance.
Margin lines are parallel to hyperplane and that pass through the nearest point of the class.Distance between margin line and hyperplane is called marginal distance.Margin helped to train generalized model.If marginal distance is less than that may generate some error for a new data point.
Support Vector Means a point that is on the margin line
If data point a linearly separable, then the hyperplane line is easy to use.If data point are not linearly separable, then SVM kernel transfer lower dimension into higher dimension that help to solve nonlinear data point.
Support Vector Machine Kernels
- Radial Basis Function(RBF)
Install Require Modules To Implement Support Vector Machine
pip install numpy pip install pandas pip install sklearn pip install matplotlib
Classification Using Support Vector Machine
Here in this example, predict class label based on features value.x1 and x2 are the feature and class is the final class label in this example
Import required Python modules
Here use numpy for array handling,pandas for reading data from csv file sklearn for data preprocessing, model training and for model evaluation. Matplolib is used for plot graph that help to understand data using visualization.
#this module is used to read data from csv file import pandas as pd #this module is used for array handeling import numpy as np #this module is used for split data into train and test set from sklearn.model_selection import train_test_split #this module is used tp plot graph import matplotlib.pyplot as plt #this module is used to train model from sklearn import svm #this module is used for model evaluation from sklearn.metrics import jaccard_score
Read Data And Print
Here use read_csv method of pandas module that helps in reading data from csv file.After reading data from csv file printed using head function that shows the top five rows from the data frame.
data = pd.read_csv("data.csv") data.head()
Plot Data Into Scatter Plot
Here we plot data of data frame into scatter plot.For different class choose different colour so easily identify that the whether the data are linearly separatable or not using hyperplane.
xx = data[data['Class'] == 4][:100].plot(kind='scatter', x='x1', y='x2', color='red', label='4'); data[data['Class'] == 2][:100].plot(kind='scatter', x='x1', y='x2', color='blue', label='2', ax=xx); plt.show()
From above graph we observe that the data are easily separatable using hyperplane line.
Data Preprocessing And Train Test Split
From data frame first split feature and class first and then split data into train and test data set.Here we use 80% data as training and 20% as data as testing purpose.That help in overfitting of the model.That is very helpful in model evaluation.For that in sklearn module train_test_split method is available.
x=data[['x1','x2']]; y=data[['Class']].values.ravel() xtrain, xtest, ytrain, ytest = train_test_split( x, y, test_size=0.2, random_state=4)
For training of support vector machine classification model use SVC class is available in sklearn.svm module.This takes an argument as a kernel which used to train the model.Here we use linear as the kernel.
model = svm.SVC(kernel='linear') model.fit(xtrain, ytrain)
SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)
Plot Hyperplane Line In Graph
xx = data[data['Class'] == 4][:100].plot(kind='scatter', x='x1', y='x2', color='red', label='4'); data[data['Class'] == 2][:100].plot(kind='scatter', x='x1', y='x2', color='blue', label='2', ax=xx); w = model.coef_ a = -w / w xx = np.linspace(0, 8) yy = a * xx - (model.intercept_) / w plt.plot(xx, yy)
Model Evaluation And Prediction
For model evaluation use jaccerd score.Many other model evaluation metrics availbe like f1-score,log loss.
ypred = model.predict(xtest) ypred [0:5] print("Accuracy score : ",jaccard_score(ytest, ypred,pos_label=2)) print("predicted class value for(5,1) :",model.predict([[5,1]]))
Accuracy score : 0.8979591836734694 predicted class value for(5,1) : 2