MLR: How to compute permuted feature importance for sequential MBO parametrized models?

Question

I am doing nested cross-validation using the packages mlr and mlrMBO. The inner CV is used for parametrization (e.g. to find the optimal parameters). Since I want to compare the performance of different learners, I conduct a benchmark experiment using mlr's benchmark function. My question is the following: Is it possible to permute on the parametrized model/learner? When I call generateFeatureImportanceData on the learner I use in the benchmark experiment, the model is estimated again (ignoring the parametrization learned by sequenital optimization). Here is some code on the iris dataset to illustrate my question (no preprocessing and only for illustration).

    library(dplyr)
    library(mlr)
    library(mlrMBO)
    library(e1071)

    nr_inner_cv <- 3L
    nr_outer_cv <- 2L

    inner = makeResampleDesc(
      "CV"
      , iters = nr_inner_cv  # folds used in tuning/bayesian optimization)

    learner_knn_base = makeLearner(id = "knn", "classif.knn")

    par.set = makeParamSet(
      makeIntegerParam("k", lower = 2L, upper = 10L)
    )

    ctrl = makeMBOControl()
    ctrl <- makeMBOControl(propose.points = 1L)
    ctrl <- setMBOControlTermination(ctrl, iters = 10L)
    ctrl <- setMBOControlInfill(ctrl, crit = crit.ei, filter.proposed.points = TRUE)
    set.seed(500)
    tune.ctrl <- makeTuneControlMBO(
      mbo.control = ctrl,
      mbo.design = generateDesign(n = 10L, par.set = par.set)
    )

    learner_knn = makeTuneWrapper(learner = learner_knn_base
                                           , resampling = inner
                                           , par.set = par.set
                                           , control = tune.ctrl
                                           , show.info = TRUE
                                  )

    learner_nb <- makeLearner(
      id = "naiveBayes"
      ,"classif.naiveBayes"
    )

    lrns = list(
      learner_knn
      , learner_nb
    )

    rdesc = makeResampleDesc("CV", iters = nr_outer_cv)

    set.seed(12345)
    bmr = mlr::benchmark(lrns, tasks = iris.task, show.info = FALSE,
                         resamplings = rdesc, models = TRUE, keep.extract = TRUE)

Some code would help, since it is not completely clear how you tune and which feature importance do you want to obtain. Are you using the `tuneWrapper`? Do you want to compute the feature importance for each outer CV split? One option would be to do it afterwards. You would have to fix the CV splits with `makeResampleInstance`, subset the task, extract each obtained optimal hyper-paramter setting. Then you can manually set the hyper-params accordingly before you call `generateFeatureImportanceData` on the subset task. — jakob-r, Dec 05 '19 at 16:31
@jakob-r Thank you very much for your comment. I added some code to show what I am doing so far. As far as I understand, it is common to compute the feature importance for each outer split (on the training set in each outer loop). — Patrick Balada, Dec 06 '19 at 08:22

score 2 · Accepted Answer · answered Dec 05 '19 at 19:39

I think this is a general question that we get more often: Can I do XY on models fitted in the CV? Short answer: Yes you can, but do you really want that?

Detailed answer

Similar Q's:

mlr: retrieve output of generateFilterValuesData within CV loop
R - mlr: Is there a easy way to get the variable importance of tuned support vector machine models in nested resampling (spatial)?

As @jakob-r's comment indicates, there are two options:

Either you recreate the model outside the CV and call your desired function on it
You do it within the CV on each fitted model of the respective fold via the extract argument in resample(). See also Q2 linked above.

1) If you want to do this on all models, see 2) below. If you want to do it on the models of certain folds only: Which criteria did you use to select those?

2) is highly computational intensive and you might want to question why you want to do this - i.e. what do you want to do with all the information of each fold's model?

In general I've never seen a study/use case where has been applied. Everything you do in the CV contributes to estimating a performance value for each fold. You do not want to interact with these models afterwards.

You would rather want to estimate the feature importance once on the non-partitioned dataset (for which you have optimized the hyperpars beforehand once). This applies in the same way to other diagnostic methods of ML models: Apply them on your "full dataset", not for each model within the CV.

Thank you very much for your elaborate answer! I'm going to study the links you posted but I have one question already. In the last section, what do you exactly mean with non-partitioned dataset? Let's say I do a 3x5 (outer x inner) nested CV. I split the entire dataset into 3 parts do estimate the "generalization performance" using the optimized parameters from the inner loop. Then I would use these three best found hyperparameter settings on the entire dataset to eventually permute and estimate importance? Is this correct? — Patrick Balada, Dec 05 '19 at 21:06
No, CV has is only for estimating performance, not for querying (multiple) hyperparameters to be applied to the entire dataset. Those hyperpars are only "optimal" for the specific fold of your CV for which they were selected. I do not think that you want/should do anything with the models of the CV - but I cannot/will not prevent you from doing differently ;) — pat-s, Dec 05 '19 at 22:20
Thank you for your quick reply. But then I seem to not understand on what sets you would estimate feature importance on. Could you specify that? Thank you — Patrick Balada, Dec 05 '19 at 23:16

MLR: How to compute permuted feature importance for sequential MBO parametrized models?

1 Answers1

Detailed answer

Linked