I tried to plot linear regression with 95% confidence interval for the slope and found results are different when comparing both methods.
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import summary_table
x = np.linspace(0,10)
y = 3*np.random.randn(50) + x
res = sm.OLS(y, x).fit()
st, data, ss2 = summary_table(res, alpha=0.05)
fittedvalues = data[:,2]
predict_mean_se = data[:,3]
predict_mean_ci_low, predict_mean_ci_upp = data[:,4:6].T
predict_ci_low, predict_ci_upp = data[:,6:8].T
plt.figure(1)
plt.subplot(211)
plt.plot(x, y, 'o', label="data")
plt.plot(x, fittedvalues, 'r-', label='OLS')
plt.plot(x, predict_ci_low, 'b--')
plt.plot(x, predict_ci_upp, 'b--')
plt.plot(x, predict_mean_ci_low, 'g--')
plt.plot(x, predict_mean_ci_upp, 'g--')
plt.legend()
plt.subplot(212)
sns.regplot(x=x, y=y,label="sns")
plt.legend()
plt.show()
By the way, what's the difference between predict_mean_ci_low and predict_ci_low? I can't find the explanation of it in the manual. The part of statsmodels is copied from this question.
Edit:
According to Josef, Warren Weckesser and this post, I need to add a constant to the OLS version.
By default, OLS in statsmodels does not include the constant term (i.e. the intercept) in the linear equation. (The constant term corresponds to a column of ones in the design matrix.)
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import summary_table
x = np.linspace(0,10)
y = 3*np.random.randn(50) + x
X = sm.add_constant(x)
res = sm.OLS(y, X).fit()
st, data, ss2 = summary_table(res, alpha=0.05)
fittedvalues = data[:,2]
predict_mean_se = data[:,3]
predict_mean_ci_low, predict_mean_ci_upp = data[:,4:6].T
predict_ci_low, predict_ci_upp = data[:,6:8].T
plt.figure(1)
plt.subplot(211)
plt.plot(x, y, 'o', label="data")
plt.plot(X, fittedvalues, 'r-', label='OLS')
plt.plot(X, predict_ci_low, 'b--')
plt.plot(X, predict_ci_upp, 'b--')
plt.plot(X, predict_mean_ci_low, 'g--')
plt.plot(X, predict_mean_ci_upp, 'g--')
plt.legend()
plt.subplot(212)
sns.regplot(x=x, y=y,label="sns")
plt.legend()
plt.show()
However, the plot looks strange now. Some odd lines and extra legends show up.
Edit2:
Differences between prediction interval and confidence interval, please see this web.
Edit3:
The exog contains all variables I want to include in the model, including a constant (a column of ones). So, we need to use X[:,1] without the column of ones for plot.
ax.plot(X[:,1], fittedvalues, 'r-', label='OLS')
ax.plot(X[:,1], predict_ci_low, 'b--')
ax.plot(X[:,1], predict_ci_upp, 'b--')
ax.plot(X[:,1], predict_mean_ci_low, 'g--')
ax.plot(X[:,1], predict_mean_ci_upp, 'g--')