1

I was doing a linear regression using statsmodels in Python, and when I plotted the result it appeared erroneous. I checked a different data set, using the code from this question.

But even when I use the following code (taken from the above linked question), the line of best fit is still not displayed correctly. I am unsure what the problem is.

Code:

import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt

X = np.random.rand(100)
Y = X + np.random.rand(100)*0.1

results = sm.OLS(Y,sm.add_constant(X)).fit()

print results.summary()

plt.scatter(X,Y)

X_plot = np.linspace(0,1,100)
plt.plot(X_plot, X_plot*results.params[0] + results.params[1])

plt.show()

My output:

Bad Linear Regression

Why isn't the line of best fit correct?

Community
  • 1
  • 1
d0rmLife
  • 4,112
  • 7
  • 24
  • 33

1 Answers1

1

add_constant prepends the constant by default, which means that the constant is the first parameter and the slopes are the following parameters.

The predicted values are also available as fittedvalues or by calling predict without arguments.

For the explicit calculation the indices of params need to be corrected, i.e.

predicted = X_plot*results.params[1] + results.params[0]

Josef
  • 21,998
  • 3
  • 54
  • 67