4

I did a multiclass (3) classification using a SVM with a linear kernel.

For this task, I used the mlr package. The SVM is from the kernlab package.

library(mlr)
library(kernlab)

print(filtered_task)

Supervised task: dtm
Type: classif
Target: target_lable
Observations: 1462
Features:
   numerics     factors     ordered functionals 
        291           0           0           0 
Missings: FALSE
Has weights: FALSE
Has blocking: FALSE
Has coordinates: FALSE
Classes: 3
negative  neutral positive 
     917      309      236 
Positive class: NA

lrn = makeLearner("classif.ksvm", par.vals = list(kernel = "vanilladot"))
mod = mlr::train(lrn, train_task)

Now I want to know which features have the highest weights for each class. Any idea how to get there?

Moreover, it would be nice to get the feature weights for each class for the cross-validation result.

rdesc = makeResampleDesc("CV",
                         iters = 10,
                         stratify = T) 
set.seed(3)
r = resample(lrn, filtered_task, rdesc)

I know that there is the possibility to calculate the feature importance like below, which is similar to the cross-validation results because of the Monte-Carlo iterations.

imp = generateFeatureImportanceData(task = train_task, 
                                    method = "permutation.importance", 
                                    learner = lrn,
                                    nmc = 10)

However, for this method I can´t get the feature importance for each class but only the importance overall.

library(dplyr)
library(ggplot)

imp_data = melt(imp$res[, 2:ncol(imp$res)]) 

imp_data = imp_data %>% 
  arrange(-value)

imp_data[1:10,] %>% 
  ggplot(aes(x = reorder(variable, value), y = value)) + 
  geom_bar(stat = "identity",  fill = "darkred") + 
  labs(x = "Features", y = "Permutation Importance") +
  coord_flip() +
  theme_minimal()

enter image description here

Banjo
  • 1,191
  • 1
  • 11
  • 28
  • Please add packages on top of your code not just in the question text. Makes it easier to see them. – NelsonGon Jan 07 '19 at 13:51
  • 1
    Maybe you want to look at partial dependency plots (or similar) to see what feature affects which class. Otherwise you could build 1-vs-other binary classification svms and have a look at the feature importance of each. – jakob-r Jan 07 '19 at 15:21
  • Thanks. What do you mean by "1-vs-other binary classification". Should I keep one class (i.e. "positive") and reannotate the two other (i.e. "neutral and negative = "new class")? Are the results for a given class equivalent to the 3-class-situation? – Banjo Jan 07 '19 at 15:36
  • 1
    You want the feature importance for one class. But the feature importance can only tell you how important this feature is to separate observations in general. So yes. For each class (A,B,C) you could label A positive, B+C as negative. Then you calculate feature importance. Afterwards B as positive and A+C as negative and finally C vs. A+B. Then you have importance values for each class vs the others. – jakob-r Jan 14 '19 at 09:40
  • Hello, did you find a solution ? I am also trying to obtain those feature importance but impossible to find an answer online... With random forest SRC I use vimp() function, with multiclasspairs I use the output of filter_genes_TSP function, with pamr I use pamr.listgenes() function, etc etc but for SVM impossible to find a function that retrieve or calculate those feature importance by class... – MarionEtp Aug 22 '23 at 11:49

0 Answers0