1

If I fuse a learner with a filter method using makeFilterWrapper, then I know I can perform feature selection using that filter within a cross-validation loop. As I understand it, filterFeatures is called before each model fit and it calls generateFilterValuesData. But is it possible to retrieve the values generated by generateFilterValuesData, using that filter, within each iteration of cross validation?

For example:

library(survival)
library(mlr)

data(veteran)
set.seed(24601)
configureMlr(show.learner.output=TRUE, show.info=TRUE)

task_id = "MAS"
mas.task <- makeSurvTask(id = task_id, data = veteran, target = c("time", "status"))
mas.task <- createDummyFeatures(mas.task)

inner = makeResampleDesc("CV", iters=2, stratify=TRUE)  # Tuning
outer = makeResampleDesc("CV", iters=3, stratify=TRUE)  # Benchmarking

cox.lrn <- makeLearner(cl="surv.coxph", id = "coxph", predict.type="response")
cox.filt.uni.abs.lrn = 
  makeFilterWrapper(
    makeLearner(cl="surv.coxph", id = "cox.filt.uni.abs", predict.type="response"), 
    fw.method="univariate.model.score", 
    fw.abs=7,
    perf.learner=cox.lrn
  )

learners = list( cox.filt.uni.abs.lrn )  
bmr = benchmark(learners=learners, tasks=mas.task, resamplings=outer, measures=list(cindex), show.info = TRUE)

mods = getBMRModels(bmr, learner.ids = c('cox.filt.uni.abs.filtered'))
for (i in 1:length(mods[[task_id]]$cox.filt.uni.abs.filtered)) {
  mod = mods$MAS$cox.filt.uni.abs.filtered[[i]]$learner.model[[1]]
  print(str(mod, max.level=1))
  **#Retrieve output of generateFilterValuesData here?**
}
panda
  • 821
  • 1
  • 9
  • 20

1 Answers1

1

You can use the extract slot within resample() in combination with getFilteredFeatures().

library(mlr)
#> Loading required package: ParamHelpers

lrn = makeFilterWrapper(learner = "classif.ksvm", fw.method = "variance",
                        fw.abs = 5)
rdesc = makeResampleDesc("CV", iters = 2)
res = resample(lrn, spam.task, rdesc, extract = getFilteredFeatures)
#> Resampling: cross-validation
#> Measures:             mmce
#> [Resample] iter 1:    0.1808696
#> [Resample] iter 2:    0.1994785
#> 
#> Aggregated Result: mmce.test.mean=0.1901740
#> 
res$extract
#> [[1]]
#> [1] "you"          "george"       "capitalAve"   "capitalLong" 
#> [5] "capitalTotal"
#> 
#> [[2]]
#> [1] "you"          "george"       "capitalAve"   "capitalLong" 
#> [5] "capitalTotal"

Created on 2019-08-07 by the reprex package (v0.3.0)

pat-s
  • 5,992
  • 1
  • 32
  • 60
  • Thanks, but it's not just the names of the features I need, it is the values produced by generateFilterValuesData for those features. For example, if I was using the filter "univariate.model.score" with perf.learner set to a Cox learner, then generateFilterValuesData would return the univariate Cox score for each feature. Unfortunately generateFilterValuesData takes a task as its argument, and I cannot see how to extract that subtask within the resampling loop. – panda Aug 08 '19 at 01:23
  • I have had another attempt at this problem (see Edit above) but cannot understand why it is not working. – panda Oct 02 '19 at 05:57
  • @panda Please do not edit questions after some weeks. No one besides me will see the edit. Please open a new one question or an issue at Github if you feel its a potential bug. – pat-s Oct 02 '19 at 08:05
  • OK. I will do that. Sorry. I thought it would be considered a duplicate if I opened a new question on the same topic. – panda Oct 02 '19 at 11:47