Practical Implementation Of Multiple Linear Regression In Python

Practical Implementation Of Multiple Linear Regression In Python

Multiple Linear Regression is used to predict continuous values.Multiple Linear Regression is predict value of the dependent variable based on two or more independent variable. In this example, predict the value of Co2Emission based on the Engine Size and Fuel consumption.

For Implement Multiple Linear Regression Need to Install Require Packages

pip install matplotlib
pip install pandas
pip install numpy
pip install sklearn

Data Set

Practical Implementation Of Multiple Linear Regression In Python
Download Data Set

Import Require Packages


import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.metrics import r2_score
from sklearn import linear_model
%matplotlib inline 

Read Data From CSV File

To read data set from CSV file, use pandas package that has a method read_csv that help to read data from and return data-frame object. After reading data from the CSV file show the data using the function that show first five rows of the dataset.

#read data from csv file
data=pd.read_csv("co2.csv");
#show the data
data.head()

Data Preprocessing

In data preprocessing select columns from data set that use to train the model. After selecting the columns split the data into a train and test data. For split data use a mask. Mask contain 0 and 1 value that shows that the value 1 is train data and 0 is test data..


#select some colums from dataset the used for future study
sdata=data[['ENGINESIZE','FUELCONSUMPTION_COMB','CO2EMISSIONS']]
sdata.head()
#mask help to divide the train and test data in 80% and 20%
mask=np.random.rand(len(sdata)) < 0.8 
#take 80% data as train set 
traindata=sdata[mask]
#take 20% data as test set that help in model evaluation.
#This is also called as out-of-tranning testing.
testdata=sdata[~mask]

Plot Graph For Selected Data

In this step plot selected data into a scatter plot so that identify the relation between features.


fig = plt.figure() 
fig3d = fig.gca(projection ='3d') 
fig3d.scatter(traindata.ENGINESIZE,traindata.FUELCONSUMPTION_COMB,traindata.CO2EMISSIONS,s = 5)   
plt.show() 
Multiple Linear Regression

Train Model And Predict Value For Test Data

In this step train model using training data and print the slope of the model and intercept of the model. Slope show the if value of independent variable, change to one unit the dependent variable how much get effected


#create a object of Linermodelclass of sklearn package
model=linear_model.LinearRegression()
#This is help in train data
model.fit(traindata[['ENGINESIZE','FUELCONSUMPTION_COMB']],traindata[['CO2EMISSIONS']]);
#model Evaluation using 20% test dataset
predict=model.predict(testdata[['ENGINESIZE','FUELCONSUMPTION_COMB']])
print("Slope : ",model.coef_);
print("Intercept :",model.intercept_);
Slope :  [[19.8350992   9.59636064]]
Intercept : [78.53531245]

Model Evaluation And Prediction

In this evaluate model accuracy, using mean square error and r2 score. Also predicts value of co2emission for a new data point.


print("Mean Squared error ",np.mean((predict.ravel() - testdata.CO2EMISSIONS)**2))
print("variance score ",model.score(np.asanyarray(testdata[['ENGINESIZE','FUELCONSUMPTION_COMB']]),np.asanyarray(testdata['CO2EMISSIONS'])))
print('R2 score',r2_score(testdata.CO2EMISSIONS,predict.ravel()))
#now predict value of co2emission for new datapoint
print("predicted value of co2emission : ",model.predict([[2.6,7.9]])[0][0])
Mean Squared error  605.1750778061678
variance score  0.8551877439412364
R2 score 0.8551877439412364
predicted value of co2emission :  205.91781939935103
Practical Implementation Of Multiple Linear Regression In Python

Zala Digvijaysinh

MCA student at Dharamsinh Desai University


Leave a Comment

Your email address will not be published. Required fields are marked *

Close Bitnami banner
Bitnami