Feature importance from benchmark experiment using nested cross-validation

Question

I am using mlr package in R to compare two learners, i.e. random forest and lasso classifier, on a binary classification task. I would like to extract the features' importance for the best classifier, random forest in this case, in a similar way to caret::varImp(). I came across getBMRFeatSelResults(), getFeatureImportance(), generateFeatureImportanceData() but none seems to do the trick. Here is my code for carrying out the benchmark experiment using nested resampling. Ideally, I would some like the mean decrease in gini. Thank you.

library(easypackages)

libraries("mlr","purrr","glmnet","parallelMap","parallel")

data = read.table("data_past.txt", h = T)

set.seed(123)

task = makeClassifTask(id = "past_history", data = data, target = "DIAG", positive = "BD")

ps_rf = makeParamSet(makeIntegerParam("mtry", lower = 4, upper = 16),makeDiscreteParam("ntree", values = 1000))

ps_lasso = makeParamSet(makeNumericParam("s", lower = .01, upper = 1),makeDiscreteParam("alpha", values = 1))

ctrl_rf = makeTuneControlRandom(maxit = 10L)

ctrl_lasso = makeTuneControlRandom(maxit = 100L)

inner = makeResampleDesc("RepCV", fold = 10, reps = 3, stratify = TRUE)

lrn_rf = makeLearner("classif.randomForest", predict.type = "prob", fix.factors.prediction = TRUE)

lrn_rf = makeTuneWrapper(lrn_rf, resampling = inner, par.set = ps_rf, control = ctrl_rf, measures = auc, show.info = FALSE)

lrn_lasso = makeLearner("classif.glmnet", predict.type = "prob", fix.factors.prediction = TRUE)

lrn_lasso = makeTuneWrapper(learner = lrn_lasso, resampling = inner, control = ctrl_lasso,  par.set = ps_lasso, measures = auc, show.info = FALSE)

outer = makeResampleDesc("CV", iters = 10, stratify = TRUE)

lrns = list(lrn_rf, lrn_lasso)

parallelStartMulticore(36)

res = benchmark(lrns, task, outer, measures = list(auc, ppv, npv, fpr, tpr, mmce), show.info = FALSE, model = T)

saveRDS(res, file = "res.rds")

parallelStop()

models <- getBMRModels(res, drop = TRUE)

Your question isn't clear and lacks focus. Please rephrase it to focus on a single problem. If necessary, separate into two or more distinct questions / posts — alexwhitworth, Dec 12 '19 at 15:32

score 1 · Answer 1 · answered Dec 12 '19 at 15:17

Since you're talking about CV,

extract the features' importance for the best classifier

does not make it clear what you want to do. There is not "one best single model" in a CV and usually importance is not measured within the CV.

CV aims to estimate/compare predictive performance, not calculate/interpret feature importance.

Here is an answer to a similar question that might help.

I came across getBMRFeatSelResults(), getFeatureImportance(), generateFeatureImportanceData() but none seems to do the trick.

By making such statements, it would help to know why these functions do not do what you want in detail rather than just stating the fact :)

Thank you very much for you reply! I reposted my question providing more focus, as suggested. — FcmC, Dec 19 '19 at 10:47

Feature importance from benchmark experiment using nested cross-validation

1 Answers1