0

The code for calculating VIF in statsmodel is below:

k_vars = exog.shape[1]
x_i = exog[:, exog_idx]
mask = np.arange(k_vars) != exog_idx
x_noti = exog[:, mask]
r_squared_i = OLS(x_i, x_noti).fit().rsquared   ## NO INTERCEPT
vif = 1. / (1. - r_squared_i)

When fitting, it does not include an intercept. It seems intercept should be included according to "Introductory Econometrics (6ed)" by Wooldridge: "... R-squared from regressing Xj on all other independent variables (and including an intercept)."

Is statmodels wrong? Is there another package I can cross check? Thanks.

iwbabn
  • 1,275
  • 4
  • 17
  • 32
  • There is nothing "wrong". This assumes that `exog` is from a regression model that includes a constant, if the user wanted one. See several related issues on statsmodels github. IIRC, you can compare the results to SAS and Stata. – Josef Feb 10 '17 at 02:12
  • Yes thank you. Answer updated. – iwbabn Feb 12 '17 at 20:01
  • Related: [scikit-learn & statsmodels - which R-squared is correct?](https://stackoverflow.com/questions/54614157/scikit-learn-statsmodels-which-r-squared-is-correct) – desertnaut Sep 02 '20 at 14:44

2 Answers2

0

When using statsmodels, always be mindful of adding constant (which is necessary in this case); quoting from the docs:

An intercept is not included by default and should be added by the user. See statsmodels.tools.add_constant.

Reference from MATLAB: https://www.mathworks.com/help/econ/examples/time-series-regression-ii-collinearity-and-estimator-variance.html

desertnaut
  • 57,590
  • 26
  • 140
  • 166
iwbabn
  • 1,275
  • 4
  • 17
  • 32
0

In a traditional matrix formulation of the linear regression, the X-matrix always has a column vector of 1's in the first position - without this, we would get a regression through the origin , i.e. without an intercept term. I came across this question while looking for VIFs in statsmodels.

Taranta
  • 35
  • 4