Simple Linear Regression can be used to model a linear relationship between one response variable(output variable) and one feature representing an explanatory variable(input variable).
For Instance, you want to know the price of a pizza. You might simply look at a menu. we will use simple linear regression to predict the price of a pizza based on an attribute of the pizza that we can observe, or an explanatory variable. Let's model the relationship between the size of a pizza and its price. First, we will write a program with scikit-learn that can predict the price of a pizza given its size.
Let's assume you have recorded the diameters and prices of pizzas that you have previously eaten in your pizza journal. These observations comprise our training data:
Instance is Pizza
Training instance, Diameter in inches, Price in dollars
1 6 72 8 93 10 134 14 17.55 18 18
import numpy as np
import matplotlib.pyplot as plt
# X represents the features of our training data, the diameters of the pizzas
# A scikit-learn convention is to name the matrix of feature vectors X.
# Uppercase letters indicate matrices, and lowercase letters indicate vectors.
X = np.array([[6], [8], [10], [14], [18]]).reshape(-1,1)
y = [7, 9, 13 ,17.5, 18] # y is a vector representing the prices of the pizzas.
plt.figure()
plt.title('Pizza price plotted against diameter')
plt.xlabel('Diameter in inches')
plt.ylabel('Price in Dollars')
plt.plot(X,y,'k.', marker='o', markerfacecolor='red', color='black', markersize=20)
plt.axis([0,25,0,25])
plt.grid(True)
plt.show()
from sklearn.linear_model import LinearRegression model = LinearRegression() # create an instance of the estimator model.fit(X,y) # Fit the model on the training data # Predict the price of a pizza with a diameter that has never been seen before test_pizza = np.array([[12]]) predicted_price = model.predict(test_pizza)[0] print('A 12" pizza should cost: $%.2f' % predicted_price)
A 12" pizza should cost: $13.68Evaluating The Model
|
Test instance |
Diameter in inches |
Observed price in dollars |
Predicted price in dollars |
|
1 |
8 |
11 |
9.7759 |
|
2 |
9 |
8.5 |
10.7522 |
|
3 |
11 |
15 |
12.7048 |
|
4 |
16 |
18 |
17.5863 |
|
5 |
12 |
11 |
13.6811 |
The score method of LinearRegression returns the model's R-squared value
import numpy as np
from sklearn.linear_model import LinearRegression
x_train = np.array([6, 8, 10, 14, 18]).reshape(-1, 1)
y_train = [7, 9, 13, 17.5, 18]
x_test = np.array([8,9,11,16,12]).reshape(-1, 1)
y_test = [11, 8.5, 15, 18, 11]
model = LinearRegression()
model.fit(x_train, y_train)
r_squared = model.score(x_test, y_test)
print(r_squared)
0.6620052929422553

Comments