I am trying to running a regression model with two different functions: OLS from statsmodels.api and linear_regression from sklearn, the output seems to be quite different from each other.
Here is my code:
import statsmodels.api as sm
import pandas as pd
import matplotlib
import scipy.stats as stats
import matplotlib.pyplot as plt
from patsy import dmatrices
from sklearn import linear_model
data = pd.read_excel("2001_SCF_Pivot.xlsx")
y,x = dmatrices("np.log(RETQLIQ) ~ W_P_ADJ+np.power(W_P_ADJ,2)+np.power(W_P_ADJ,3)+INCOME+np.power(INCOME,2)+WHITE+AGE+EDUC+FEMALE+SINGLE",data = data)
LinearRegression = linear_model.LinearRegression()
ols = LinearRegression.fit(x,y)
sm_prediction = ols.predict(x)
model_fit = sm.OLS(y,x)
results = model_fit.fit()
sklearn_prediction = results.predict(x)
When I scatter the data and add both predictions on the graph while in theory I need to get two plots on each other, the prediction of the two functions seems to be quite different as you can see from the attached image. My question is why do I get different results and what is the right way to do it in this case, thanks a lot in advance!
You can find the related graph here : https://i.stack.imgur.com/WKJqQ.jpg