Implementation of Simple Linear Regression In Python

Implementation of Simple Linear Regression In Python

Simple Linear Regression is used to predict continuous values. Simple Linear Regression is predict value of the dependent variable based on one independent variable. In this example, predict the value of Co2Emission based on the Engine Size.

Application of Simple Liner Regression

  • Predict Maximum Temperature based on the Minimum Temperature
  • Predict Sales based on previous year sales data

To Implement Simple Linear Regression Need To Install Require Packages

pip install matplotlib
pip install pandas
pip install numpy
pip install sklearn

Data Set

Implementation of Simple Linear Regression In Python
Download dataset

Import Require Packages


import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.metrics import r2_score
from sklearn import linear_model
%matplotlib inline 

Read Data From CSV File

To read data from csv file we use pandas module .That contain read_csv method that help in read data from csv file and return data frame object.


#read data from csv file
data=pd.read_csv("co2.csv");
#show the data
data.head()

Data Pre-processing

In data pre-processing select columns that require to train model. Data pre-processing play major role in a model accuracy because in pre-processing remove a noisy data and outlier from the data.That help to increase model accuracy.


#select some colums from dataset the used for future study
sdata=data[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB','CO2EMISSIONS']]
sdata.head()

Plot Graph For Selected Data

Plot selected data into scatter plot and observe the relationship between engine size and co2emission. From observation of the below graph we identify that the between fuel consumption and co2emissions relation is liner so this algorithm generate appropriate if relation is not liner the use non-liner regression for that data set.

Implementation of Simple Linear Regression In Python

Train And Test Data Split

Split the data into a train and test data.For split data use a mask. Mask contain 0 and 1 value that shows that the value 1 is train data and 0 is test data.This help to accurate model evaluation.You can also use the train and test split method of sklearn package for train test split.


#mask help to divide the train and test data in 80% and 20%
mask=np.random.rand(len(sdata)) < 0.8 
#take 80% data as train set 
traindata=sdata[mask]
#take 20% data as test set that help in model evaluation.
#This is also called as out-of-tranning testing.
testdata=sdata[~mask]

Train Model and Predict Value For Test Data

In this step train model using training data and print the slope of the model and intercept of the model. Slope show the if value of independent variable, change to one unit the dependent variable how much get effected.To train model create object of LinerRegression class of liner_model package.After creating object call fit method with train data.


#create a object of Linermodelclass of sklearn package
model=linear_model.LinearRegression()
#This is help in train data
model.fit(traindata[['ENGINESIZE']],traindata[['CO2EMISSIONS']]);
print("Slope : ",model.coef_);
print("Intercept :",model.intercept_);
#model Evaluation using 20% test dataset
predict=model.predict(testdata[['ENGINESIZE']])
Slope :  [[38.55894843]]
Intercept : [126.99107297]

Plot Regression Line

Regression line help to predict new value.In below graph blue dots is actual data values and the red line shows the predicted data value.


#plot Output in graph
#this plot actual data
plt.scatter(traindata.ENGINESIZE,traindata.CO2EMISSIONS,color='blue')
#this plot predicted value using train model for same data
plt.plot(traindata.ENGINESIZE,model.coef_[0][0]*traindata.ENGINESIZE+model.intercept_[0],'r-');
plt.xlabel("Engine Size");
plt.ylabel("Co2 Emissions");
Simple Linear Regression

Model Evaluation And Prediction

For model evaluation use mean error that is the average of difference between actual value of co2emission and predicted value of co2emission. In same way also use mean squared error in that calculate average of square of difference between actual value and predicted value. R2 score value between 0 and 1 if value is near to one then the accuracy of model is high.For predict the value of co2emission use the predict method of LinearRegression class.


print("Mean absoulte error ",np.mean(np.absolute(predict.ravel() - testdata.CO2EMISSIONS)))
print("Mean Squared error ",np.mean((predict.ravel() - testdata.CO2EMISSIONS)**2))
print("R2 score ",r2_score(predict.ravel(),testdata.CO2EMISSIONS))
print("Predicted value of Co2Emission : ",model.predict([[3.2]])[0][0])
Mean absoulte error  22.22677162974374
Mean Squared error  892.5298502928439
R2 score  0.6681287827925666
Predicted value of Co2Emission :  250.37970793531605
Implementation of Simple Linear Regression In Python

Zala Digvijaysinh

MCA student at Dharamsinh Desai University


Leave a Comment

Your email address will not be published. Required fields are marked *

Close Bitnami banner
Bitnami