2

Here's the dataset which consists of X = 45 columns collected the data from bioclimate database. The multicollinearity test model -

from statsmodels.stats.outliers_influence import variance_inflation_factor
vif_data = pd.DataFrame()
vif_data["feature"] = X.columns

x columns

#Calculating VIF for each feature
vif_data["VIF"] = [variance_inflation_factor(X.values, i)
              for i in range (0, len(X.columns))]
-------------------------------------------------------------
/usr/local/lib/python3.7/dist-packages/statsmodels/stats/outliers_influence.py:193: 
RuntimeWarning: divide by zero encountered in double_scalars
vif = 1. / (1. - r_squared_i)

vif_data

Trial :

  • I've converted all variables into float and int vice-versa but still getting infinite values for all variables after performing multicollinearity test.

  • I didn't find any reference material to tackle this problem specially in python. Please help me out, I am using it for species distribution modelling.

perth
  • 21
  • 3
  • 1
    vif is inf if variables are perfectly collinear, that is if the r-squared of the regression of one explanatory variable on all other explanatory variables is 1. Do you have fewer observations than columns? – Josef Sep 12 '22 at 12:21
  • Also, note that statsmodels variance inflation factor assumes a constant or demeaned variables as in the design matrix of a regression model – Josef Sep 12 '22 at 12:22
  • Yes, the column size 48 and rows 28 and all are based on presence location. – perth Sep 14 '22 at 08:52

0 Answers0