Simple Linear Regression is used to predict continuous values. Simple Linear Regression is predict value of the dependent variable based on one independent variable. In this example, predict the value of Co2Emission based on the Engine Size.

### Application of Simple Liner Regression

- Predict Maximum Temperature based on the Minimum Temperature
- Predict Sales based on previous year sales data

### To Implement Simple Linear Regression Need To Install Require Packages

pip install matplotlib pip install pandas pip install numpy pip install sklearn

### Data Set

### Import Require Packages

```
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.metrics import r2_score
from sklearn import linear_model
%matplotlib inline
```

### Read Data From CSV File

To read data from csv file we use pandas module .That contain read_csv method that help in read data from csv file and return data frame object.

```
#read data from csv file
data=pd.read_csv("co2.csv");
#show the data
data.head()
```

### Data Pre-processing

In data pre-processing select columns that require to train model. Data pre-processing play major role in a model accuracy because in pre-processing remove a noisy data and outlier from the data.That help to increase model accuracy.

```
#select some colums from dataset the used for future study
sdata=data[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB','CO2EMISSIONS']]
sdata.head()
```

### Plot Graph For Selected Data

Plot selected data into scatter plot and observe the relationship between engine size and co2emission. From observation of the below graph we identify that the between fuel consumption and co2emissions relation is liner so this algorithm generate appropriate if relation is not liner the use non-liner regression for that data set.

### Train And Test Data Split

Split the data into a train and test data

```
#mask help to divide the train and test data in 80% and 20%
mask=np.random.rand(len(sdata)) < 0.8
#take 80% data as train set
traindata=sdata[mask]
#take 20% data as test set that help in model evaluation.
#This is also called as out-of-tranning testing.
testdata=sdata[~mask]
```

### Train Model and Predict Value For Test Data

In this step train model using training data and print the slope of the model and intercept of the model. Slope show the if value of independent variable, change to one unit the dependent variable how much get effected.To train model create object of LinerRegression class of liner_model package.After creating object call fit method with train data.

```
#create a object of Linermodelclass of sklearn package
model=linear_model.LinearRegression()
#This is help in train data
model.fit(traindata[['ENGINESIZE']],traindata[['CO2EMISSIONS']]);
print("Slope : ",model.coef_);
print("Intercept :",model.intercept_);
#model Evaluation using 20% test dataset
predict=model.predict(testdata[['ENGINESIZE']])
```

Slope : [[38.55894843]] Intercept : [126.99107297]

### Plot Regression Line

Regression line help to predict new value

```
#plot Output in graph
#this plot actual data
plt.scatter(traindata.ENGINESIZE,traindata.CO2EMISSIONS,color='blue')
#this plot predicted value using train model for same data
plt.plot(traindata.ENGINESIZE,model.coef_[0][0]*traindata.ENGINESIZE+model.intercept_[0],'r-');
plt.xlabel("Engine Size");
plt.ylabel("Co2 Emissions");
```

### Model Evaluation And Prediction

For model evaluation use mean error that is the average of difference between actual value of co2emission and predicted value of co2emission. In same way also use mean squared error in that calculate average of square of difference between actual value and predicted value. R2 score value between 0 and 1 if value is near to one then the accuracy of model is high.For predict the value of co2emission use the predict method of LinearRegression class.

```
print("Mean absoulte error ",np.mean(np.absolute(predict.ravel() - testdata.CO2EMISSIONS)))
print("Mean Squared error ",np.mean((predict.ravel() - testdata.CO2EMISSIONS)**2))
print("R2 score ",r2_score(predict.ravel(),testdata.CO2EMISSIONS))
print("Predicted value of Co2Emission : ",model.predict([[3.2]])[0][0])
```

Mean absoulte error 22.22677162974374 Mean Squared error 892.5298502928439 R2 score 0.6681287827925666 Predicted value of Co2Emission : 250.37970793531605