2
    statsmodels.stats.outliers_influence.variance_inflation_factor(exog, exog_idx)

Parameters: 
 exog (ndarray) – design matrix with all explanatory variables, as      for example used in regression
 exog_idx (int) – index of the exogenous variable in the columns of exog

I am finding difficulty in understanding the parameters. For example I have a dataset with 20 variables, and one class variable (total 21 variables)

Var1 var2 Var3 Var4 class variable

so, exog will be all of these variables including class variable ? or exog will be all of these variables excluding class variable ?

What should be exog_idx ?

Rashida Hasan
  • 149
  • 3
  • 13
  • Possible duplicate of [Variance Inflation Factor in Python](https://stackoverflow.com/questions/42658379/variance-inflation-factor-in-python) – Szymon Maszke Apr 22 '19 at 22:54

1 Answers1

3

I also find statsmodels documents are extremely unhelpful since they don't give any examples sometimes it is hard to understand.

I was also looking for some answers and examples there are some solution suggestions, but let me try to explain.

exog -> independent variables or features you are using to predict your target

exog_idx -> index of the variable

The way you do this is a list comprehension, assume you have pandas data frame (df):

vif = pd.DataFrame([variance_inflation_factor(df.values, i) for i in range(df.shape[1]), index=df.columns, columns=['VIF_value'])

This will create a dataset called vif and you will see VIF values for each feature you have.

Tolga
  • 61
  • 8