Variable importance for support vector machine and naive Bayes classifiers in R

Question

I’m working on building predictive classifiers in R on a cancer dataset. I’m using random forest, support vector machine and naive Bayes classifiers. I’m unable to calculate variable importance on SVM and NB models

I end up receiving the following error.

Error in UseMethod("varImp") : no applicable method for 'varImp' applied to an object of class "c('svm.formula', 'svm')"

I would greatly appreciate it if anyone could help me.

Welcome to StackOverflow. Please read (1) [how do I ask a good question](http://stackoverflow.com/help/how-to-ask), (2) [How to create a MCVE](http://stackoverflow.com/help/mcve) as well as (3) [how to provide a minimal reproducible example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610). Then edit and improve your question accordingly. I.e., provide the code to reproduce the error e.g. by using built in example data sets. — lukeA, Apr 25 '16 at 17:09

score 9 · Answer 1 · edited Apr 13 '17 at 12:44

Given

library(e1071)
model <- svm(Species ~ ., data = iris)
class(model)
# [1] "svm.formula" "svm"     

library(caret)
varImp(model)
# Error in UseMethod("varImp") : 
#   no applicable method for 'varImp' applied to an object of class "c('svm.formula', 'svm')"

methods(varImp)
#  [1] varImp.bagEarth      varImp.bagFDA        varImp.C5.0*         varImp.classbagg*   
#  [5] varImp.cubist*       varImp.dsa*          varImp.earth*        varImp.fda*         
#  [9] varImp.gafs*         varImp.gam*          varImp.gbm*          varImp.glm*         
# [13] varImp.glmnet*       varImp.JRip*         varImp.lm*           varImp.multinom*    
# [17] varImp.mvr*          varImp.nnet*         varImp.pamrtrained*  varImp.PART*        
# [21] varImp.plsda         varImp.randomForest* varImp.RandomForest* varImp.regbagg*     
# [25] varImp.rfe*          varImp.rpart*        varImp.RRF*          varImp.safs*        
# [29] varImp.sbf*          varImp.train*

There is no function varImp.svm in methods(varImp), therefore the error. You might want to have a look at this post on Cross Validated, too.

score 6 · Answer 2 · answered Sep 14 '17 at 15:20

If you use R, the variable importance can be calculated with Importance method in rminer package. This is my sample code:

library(rminer)
M <- fit(y~., data=train, model="svm", kpar=list(sigma=0.10), C=2)
svm.imp <- Importance(M, data=train)

In detail, refer to the following link https://cran.r-project.org/web/packages/rminer/rminer.pdf

Andrei Catana · Answer 3 · 2020-04-29T06:29:09.880

I have created a loop that iteratively removes one predictor at a time and captures in a data frame various performance measures derived from the confusion matrix. This is not supposed to be a one size fits all solution, I don't have the time for it, but it should not be difficult to apply modifications.

Make sure that the predicted variable is last in the data frame.

I mainly needed specificity values from the models and by removing one predictor at a time, I can evaluate the importance of each predictor, i.e. by removing a predictor, the smallest specificity of the model(less predictor number i) means that the predictor has the most importance. You need to know on what indicator you will attribute importance.

You can also add another for loop inside to change between kernels, i.e. linear, polynomial, radial, but you might have to account for the other parameters,e.g. gamma. Change "label_fake" with your target variable and df_final with your data frame.

SVM version:

set.seed(1)
varimp_df <- NULL # df with results
ptm1 <- proc.time() # Start the clock!
for(i in 1:(ncol(df_final)-1)) { # the last var is the dep var, hence the -1
  smp_size <- floor(0.70 * nrow(df_final)) # 70/30 split
  train_ind <- sample(seq_len(nrow(df_final)), size = smp_size)
  training <- df_final[train_ind, -c(i)] # receives all the df less 1 var
  testing <- df_final[-train_ind, -c(i)]

  tune.out.linear <- tune(svm, label_fake ~ .,
                          data = training,
                          kernel = "linear",
                          ranges = list(cost =10^seq(1, 3, by = 0.5))) # you can choose any range you see fit

  svm.linear <- svm(label_fake ~ .,
                    kernel = "linear",
                    data = training,
                    cost = tune.out.linear[["best.parameters"]][["cost"]])

  train.pred.linear <- predict(svm.linear, testing)
  testing_y <- as.factor(testing$label_fake)
  conf.matrix.svm.linear <- caret::confusionMatrix(train.pred.linear, testing_y)
  varimp_df <- rbind(varimp_df,data.frame(
                     var_no=i,
                     variable=colnames(df_final[,i]), 
                     cost_param=tune.out.linear[["best.parameters"]][["cost"]],
                     accuracy=conf.matrix.svm.linear[["overall"]][["Accuracy"]],
                     kappa=conf.matrix.svm.linear[["overall"]][["Kappa"]],
                     sensitivity=conf.matrix.svm.linear[["byClass"]][["Sensitivity"]],
                     specificity=conf.matrix.svm.linear[["byClass"]][["Specificity"]]))
  runtime1 <- as.data.frame(t(data.matrix(proc.time() - ptm1)))$elapsed # time for running this loop
  runtime1 # divide by 60 and you get minutes, /3600 you get hours
    }

Naive Bayes version:

varimp_nb_df <- NULL
ptm1 <- proc.time() # Start the clock!
for(i in 1:(ncol(df_final)-1)) {
  smp_size <- floor(0.70 * nrow(df_final))
  train_ind <- sample(seq_len(nrow(df_final)), size = smp_size)
  training <- df_final[train_ind, -c(i)]
  testing <- df_final[-train_ind, -c(i)]

  x = training[, names(training) != "label_fake"]
  y = training$label_fake

  model_nb_var = train(x,y,'nb', trControl=ctrl)

  predict_nb_var <- predict(model_nb_var, newdata = testing )

  confusion_matrix_nb_1 <- caret::confusionMatrix(predict_nb_var, testing$label_fake)  

  varimp_nb_df <- rbind(varimp_nb_df, data.frame(
    var_no=i,
    variable=colnames(df_final[,i]), 
    accuracy=confusion_matrix_nb_1[["overall"]][["Accuracy"]],
    kappa=confusion_matrix_nb_1[["overall"]][["Kappa"]],
    sensitivity=confusion_matrix_nb_1[["byClass"]][["Sensitivity"]],
    specificity=confusion_matrix_nb_1[["byClass"]][["Specificity"]]))
  runtime1 <- as.data.frame(t(data.matrix(proc.time() - ptm1)))$elapsed # time for running this loop
  runtime1 # divide by 60 and you get minutes, /3600 you get hours
}

Have fun!

Variable importance for support vector machine and naive Bayes classifiers in R

3 Answers3