1

I am new in R and learning ml using caret. I was working on UCI bank marketing response data but used iris data here for reproducibility.

Issue is that I am getting error on running vif from car package on classification models.

library(tidyverse)
library(caret)
library(car)

iris

# to make it binary classification
iris_train <- iris %>% filter(Species %in% c("setosa","versicolor"))
iris_train$Species <- factor(iris_train$Species)

Creating Model


model_iris3 <- train(Species ~ ., 
                      data = iris_train, 
                      method = "gbm",
                     verbose = FALSE
                      # tuneLength = 5,
                      # metric = "Spec", 
                      # trControl = fitCtrl
                      )

Error in vif

# vif
car::vif(model_iris3)

Error in UseMethod("vcov") : no applicable method for 'vcov' applied to an object of class "c('train', 'train.formula')"

I got to know about using finalModel for vif from this SO post: Variance inflation VIF for glm caret model in R

But still getting an error

car::vif(model_iris3$finalModel)

Error in UseMethod("vcov") : no applicable method for 'vcov' applied to an object of class "gbm"

same error I get with adaboost, earth etc.

Appreciate any help or suggestions to solve this issue.

(UPDATE)

Finally this worked (see the complete solution in Answers if you still get an error):

vif doesn't work on classification models so convert dependent variable to numeric and run linear regression on it and then vif


model_iris4 <- train(as.numeric(Species) ~ ., 
                      data = iris_train, 
                      method = "lm",
                     verbose = FALSE
                      # tuneLength = 5,
                      # metric = "Spec", 
                      # trControl = fitCtrl
                      )

car::vif(model_iris4$finalModel)

######## output ##########

Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    4.803414     2.594389    36.246326    25.421395 
ViSa
  • 1,563
  • 8
  • 30

2 Answers2

2

Finally this worked:

vif doesn't work on classification models so convert dependent variable to numeric and run linear regression on it and then vif

model_iris4 <- train(as.numeric(Species) ~ ., 
                      data = iris_train, 
                      method = "lm",
                     verbose = FALSE
                      # tuneLength = 5,
                      # metric = "Spec", 
                      # trControl = fitCtrl
                      )

car::vif(model_iris4$finalModel)

######## output ##########

Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    4.803414     2.594389    36.246326    25.421395 

There are high chances that if you have dummies in model than it may still give error.

For example: After following above steps I got new error on my original UCI banking dataset: Error in vif.default(model_vif_check$finalModel) : there are aliased coefficients in the model

To solve this error you can try below steps

run alias() on model where predicted variable is numeric

alias_res <- alias( 
  lm( as.numeric(y) ~ duration+nr.employed+euribor3m+pdays+emp.var.rate+poutcome.success+month.mar+cons.conf.idx+contact.telephone+contact.cellular+previous+age+cons.price.idx+month.jun+job.retired, data = train ) 
  )

alias_res
ld.vars <- attributes(alias_res$Complete)$dimnames[[1]]
ld.v

this will return an alias that was causing error, so just remove that predictor from the model and run model again (in my case it was "contact.cellular")

model_vif_check_aliased <- train(as.numeric(pull(y)) ~ duration+nr.employed+euribor3m+pdays+emp.var.rate+poutcome.success+month.mar+cons.conf.idx+contact.telephone+previous+age+cons.price.idx+month.jun+job.retired, 
                      data = train, 
                      method = "lm"
                      )
model_vif_check_aliased

Now run vif

vif_values <- car::vif(model_vif_check_aliased$finalModel)
vif_values

duration nr.employed euribor3m pdays 1.016706 75.587546 80.930134 10.216410 emp.var.rate poutcome.success month.mar cons.conf.idx 64.542469 9.190354 1.077018 3.972748 contact.telephone previous age cons.price.idx 2.091533 1.850089 1.185461 28.614339 month.jun job.retired 3.936681 1.198350

ViSa
  • 1,563
  • 8
  • 30
  • Great Job! You can change your "best answer" flag and select this answer instead of mine since this is the correct best answer. :-) – BrianLang Oct 30 '20 at 08:30
  • Thanks @BrianLang :) , according to SO I can't accept my own answer for 2 days. – ViSa Oct 30 '20 at 12:22
1

car::vif is a function that needs to be adapted for each type of model. It works in the linked question because car::vif has been implemented to cope with glm models. car::vif does not support your chosen model type: gbm.

BrianLang
  • 831
  • 4
  • 14
  • thanks @BrianLang, I was looking more on internet and now I think `car::vif` or in general `vif` doesn't work with `classification` models. For `classification` I will have to change dependent variable into numeric and then create linear regression on it and then run `vif`. Reference: https://www.researchgate.net/post/How_to_test_multicollinearity_in_binary_logistic_logistic_regression – ViSa Oct 29 '20 at 14:34
  • 1
    Exactly. [Crossvalidated has more information as well.](https://stats.stackexchange.com/questions/294678/multicollinearity-between-two-categorical-variables) – BrianLang Oct 29 '20 at 14:40
  • yes, I will update the solution code in the post. Thanks again for helping :) – ViSa Oct 29 '20 at 14:44
  • If you find the right way to do it, you can always answer your own question! That way when people look in the future they will have your newfound knowledge! – BrianLang Oct 29 '20 at 14:46
  • Sure, I will add the code in the Answer section as well !! – ViSa Oct 29 '20 at 14:49