3

I tried to plot linear regression with 95% confidence interval for the slope and found results are different when comparing both methods.

import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import summary_table

x = np.linspace(0,10)
y = 3*np.random.randn(50) + x
res = sm.OLS(y, x).fit()

st, data, ss2 = summary_table(res, alpha=0.05)
fittedvalues = data[:,2]
predict_mean_se  = data[:,3]
predict_mean_ci_low, predict_mean_ci_upp = data[:,4:6].T
predict_ci_low, predict_ci_upp = data[:,6:8].T

plt.figure(1)
plt.subplot(211)
plt.plot(x, y, 'o', label="data")
plt.plot(x, fittedvalues, 'r-', label='OLS')
plt.plot(x, predict_ci_low, 'b--')
plt.plot(x, predict_ci_upp, 'b--')
plt.plot(x, predict_mean_ci_low, 'g--')
plt.plot(x, predict_mean_ci_upp, 'g--')
plt.legend()

plt.subplot(212)
sns.regplot(x=x, y=y,label="sns")
plt.legend()
plt.show()

diff

By the way, what's the difference between predict_mean_ci_low and predict_ci_low? I can't find the explanation of it in the manual. The part of statsmodels is copied from this question.

Edit:

According to Josef, Warren Weckesser and this post, I need to add a constant to the OLS version.

By default, OLS in statsmodels does not include the constant term (i.e. the intercept) in the linear equation. (The constant term corresponds to a column of ones in the design matrix.)

import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import summary_table

x = np.linspace(0,10)
y = 3*np.random.randn(50) + x
X = sm.add_constant(x)
res = sm.OLS(y, X).fit()

st, data, ss2 = summary_table(res, alpha=0.05)
fittedvalues = data[:,2]
predict_mean_se  = data[:,3]
predict_mean_ci_low, predict_mean_ci_upp = data[:,4:6].T
predict_ci_low, predict_ci_upp = data[:,6:8].T

plt.figure(1)

plt.subplot(211)
plt.plot(x, y, 'o', label="data")
plt.plot(X, fittedvalues, 'r-', label='OLS')
plt.plot(X, predict_ci_low, 'b--')
plt.plot(X, predict_ci_upp, 'b--')
plt.plot(X, predict_mean_ci_low, 'g--')
plt.plot(X, predict_mean_ci_upp, 'g--')
plt.legend()

plt.subplot(212)
sns.regplot(x=x, y=y,label="sns")
plt.legend()
plt.show()

However, the plot looks strange now. Some odd lines and extra legends show up. pic2

Edit2:

Differences between prediction interval and confidence interval, please see this web.

Edit3:

The exog contains all variables I want to include in the model, including a constant (a column of ones). So, we need to use X[:,1] without the column of ones for plot.

ax.plot(X[:,1], fittedvalues, 'r-', label='OLS')
ax.plot(X[:,1], predict_ci_low, 'b--')
ax.plot(X[:,1], predict_ci_upp, 'b--')
ax.plot(X[:,1], predict_mean_ci_low, 'g--')
ax.plot(X[:,1], predict_mean_ci_upp, 'g--')

exclude

zxdawn
  • 825
  • 1
  • 9
  • 19
  • 1
    you need to add a constant to the OLS version, your x does not contain a constant and the fitted line has to go through the origin. In the confidence interval there is no uncertainty about the constant. – Josef Aug 19 '18 at 12:22
  • @Josef Thanks! I add `X = sm.add_constant(x); res = sm.OLS(y, X).fit()` and it works same now. What's the meaning or effect of the constant? Why is it called 'constant'? – zxdawn Aug 19 '18 at 12:32
  • It's the intercept `a` in the regression line `y = a + b x`. It's called `constant` because it is the same value for all observations. In the formula interface it is called `intercept` by patsy. – Josef Aug 19 '18 at 13:22
  • @Josef But, `x` is `independent variables`. why should I set them to `constant` at the same time? – zxdawn Aug 19 '18 at 13:28
  • See https://stackoverflow.com/questions/51738734/r-in-stats-linregress-compared-to-r-squared-in-statsmodels for a related question. – Warren Weckesser Aug 19 '18 at 19:24
  • @WarrenWeckesser Thanks! I understand it now. Could tell me what's the difference between `predict_ci_low` and `predict_mean_ci_low`? – zxdawn Aug 20 '18 at 01:39
  • @Josef Now, I understand the meaning of constant. But, after adding the `constant`, I got some odd lines and extra legends. How to deal with it? – zxdawn Aug 20 '18 at 01:55

0 Answers0