I am working through my first non-linear regression in python and there are a couple of things I am obviously not getting quite right.
Here is the sample data:
X
8.6
6.2
6.4
4
8.4
7.4
8.2
5
2
4
8.6
6.2
6.4
4
8.4
7.4
8.2
5
2
4
y
87
61
75
72
85
73
83
63
21
70
87
70
64
64
85
73
83
61
21
50
Here is my code:
#import libraries
import pandas as pd
from sklearn import linear_model
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
#variables
r = 100
#import dataframe
df = pd.read_csv('Book1.csv')
#Assign X & y
X = df.iloc[:, 4:5]
y = df.iloc[:, 2]
#import PolynomialFeatures and create X_poly
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(2)
X_poly = poly.fit_transform(X)
#fit regressor
reg = linear_model.LinearRegression()
reg.fit(X_poly, y)
#get R2 score
score = round(reg.score(X_poly, y), 4)
#get coefficients
coef = reg.coef_
intercept = reg.intercept_
#plot
pred = reg.predict(X_poly)
plt.scatter(X, y, color='blue', s=1)
plt.plot(X, pred, color='red')
plt.show()
When I run this code, I get a chart that looks like this:
The first thing I noticed is that the X variables are on the vertical axis rather than the horizontal that I expected (and usually see)
The next thing I noticed is that there are several red lines when I was really just expecting one curve representing the binomial equation for the data.
Finally, when I look at the coefficients, they are not as I expect. To test it, I did a regression using the same data in excel and then confirmed the right answer by substituting numbers for X.
The coefficients I get in excel are y = -1.0305x^2 + 19.156x - 5.9868 with an R-squared value of 0.8221.
In python, my model is providing a coef_ of [0, -0.0383131, 0.00126994] with an intercept of 2.4339 and an r-squared score of 0.8352.
In trying to learn this stuff I have largely tried to adapt bits of code I have seen and watched youtube videos. I have also looked through stack exchange but can't find the answers to my questions so have resorted to asking for help despite knowing that the answers are probably really obvious to someone who knows what they are doing.
I would really appreciate someone taking the time to explain some of the basics that I am obviously missing.
Thanks