Simple Linear Regression is used to predict continuous values. Simple Linear Regression is predict value of the dependent variable based on one independent variable. In this example, predict the value of Co2Emission based on the Engine Size.
Application of Simple Liner Regression
- Predict Maximum Temperature based on the Minimum Temperature
- Predict Sales based on previous year sales data
To Implement Simple Linear Regression Need To Install Require Packages
pip install matplotlib pip install pandas pip install numpy pip install sklearn
Import Require Packages
import matplotlib.pyplot as plt import numpy as np import pandas as pd from sklearn.metrics import r2_score from sklearn import linear_model %matplotlib inline
Read Data From CSV File
To read data from csv file we use pandas module .That contain read_csv method that help in read data from csv file and return data frame object.
#read data from csv file data=pd.read_csv("co2.csv"); #show the data data.head()
In data pre-processing select columns that require to train model. Data pre-processing play major role in a model accuracy because in pre-processing remove a noisy data and outlier from the data.That help to increase model accuracy.
#select some colums from dataset the used for future study sdata=data[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB','CO2EMISSIONS']] sdata.head()
Plot Graph For Selected Data
Plot selected data into scatter plot and observe the relationship between engine size and co2emission. From observation of the below graph we identify that the between fuel consumption and co2emissions relation is liner so this algorithm generate appropriate if relation is not liner the use non-liner regression for that data set.
Train And Test Data Split
Split the data into a train and test data
#mask help to divide the train and test data in 80% and 20% mask=np.random.rand(len(sdata)) < 0.8 #take 80% data as train set traindata=sdata[mask] #take 20% data as test set that help in model evaluation. #This is also called as out-of-tranning testing. testdata=sdata[~mask]
Train Model and Predict Value For Test Data
In this step train model using training data and print the slope of the model and intercept of the model. Slope show the if value of independent variable, change to one unit the dependent variable how much get effected.To train model create object of LinerRegression class of liner_model package.After creating object call fit method with train data.
#create a object of Linermodelclass of sklearn package model=linear_model.LinearRegression() #This is help in train data model.fit(traindata[['ENGINESIZE']],traindata[['CO2EMISSIONS']]); print("Slope : ",model.coef_); print("Intercept :",model.intercept_); #model Evaluation using 20% test dataset predict=model.predict(testdata[['ENGINESIZE']])
Slope : [[38.55894843]] Intercept : [126.99107297]
Plot Regression Line
Regression line help to predict new value
#plot Output in graph #this plot actual data plt.scatter(traindata.ENGINESIZE,traindata.CO2EMISSIONS,color='blue') #this plot predicted value using train model for same data plt.plot(traindata.ENGINESIZE,model.coef_*traindata.ENGINESIZE+model.intercept_,'r-'); plt.xlabel("Engine Size"); plt.ylabel("Co2 Emissions");
Model Evaluation And Prediction
For model evaluation use mean error that is the average of difference between actual value of co2emission and predicted value of co2emission. In same way also use mean squared error in that calculate average of square of difference between actual value and predicted value. R2 score value between 0 and 1 if value is near to one then the accuracy of model is high.For predict the value of co2emission use the predict method of LinearRegression class.
print("Mean absoulte error ",np.mean(np.absolute(predict.ravel() - testdata.CO2EMISSIONS))) print("Mean Squared error ",np.mean((predict.ravel() - testdata.CO2EMISSIONS)**2)) print("R2 score ",r2_score(predict.ravel(),testdata.CO2EMISSIONS)) print("Predicted value of Co2Emission : ",model.predict([[3.2]]))
Mean absoulte error 22.22677162974374 Mean Squared error 892.5298502928439 R2 score 0.6681287827925666 Predicted value of Co2Emission : 250.37970793531605