To be able to fiddle with the models after resampling its best to call resample with store_models = TRUE
Using your example
library(mlr3verse)
set.seed(1)
rr <- resample(task,
at,
rsmp("cv", folds = 2),
store_models = TRUE)
After you have finished resampling you can access the inner structure of the generated object like this:
To get the row ids in each fold:
rr$resampling$instance
#output
row_id fold
1: 5 1
2: 8 1
3: 9 1
4: 12 1
5: 13 1
---
996: 989 2
997: 993 2
998: 994 2
999: 995 2
1000: 996 2
with these and the tuned autotuners we can manually generate the predictions.
Generate a list of test indexes
rsample <- split(rr$resampling$instance$row_id,
rr$resampling$instance$fold)
iterate over the folds and tuned autotuners and predict:
lapply(1:2, function(i){
x <- rsample[[i]] #get the test row ids
task_test <- task$clone() #clone the task so we don't change the original task
task_test$filter(x) #filter on the test row ids
preds <- rr$learners[[i]]$predict(task_test) #use the trained autotuner and above filtered task
preds
}) -> preds_manual
To check if these predictions match the ouput of resample
all.equal(preds_manual,
rr$predictions())
#output
TRUE
To get information about the tuning
zz <- rr$data$learners()$learner
lapply(zz, function(x) x$tuning_result)
#output
[[1]]
classif.ranger.num.trees classif.ranger.sample.fraction learner_param_vals
1: 342 0.7931022 <list[7]>
x_domain classif.auc
1: <list[2]> 0.7981283
[[2]]
classif.ranger.num.trees classif.ranger.sample.fraction learner_param_vals
1: 407 0.7964164 <list[7]>
x_domain classif.auc
1: <list[2]> 0.7706533
the slot
zz[[1]]$learner$state$model$importance
contains info about filter_importance
step
specifically
lapply(zz, function(x) x$learner$state$model$importance$scores)
#output
[[1]]
amount status age
27.491369 25.776145 22.021369
duration purpose credit_history
18.732521 16.251643 14.884843
employment_duration savings property
11.225678 10.796583 9.078619
personal_status_sex present_residence installment_rate
8.914802 7.875384 7.491573
job number_credits other_installment_plans
6.293323 5.662485 5.345666
housing telephone other_debtors
4.869471 3.742213 3.548856
people_liable foreign_worker
2.632163 1.054919
[[2]]
amount duration age
26.764389 22.139400 20.749865
status purpose employment_duration
20.524764 11.793789 10.962301
credit_history installment_rate savings
10.416572 9.597835 9.491894
property present_residence job
9.403157 7.877391 6.760945
personal_status_sex housing other_installment_plans
6.699065 5.811131 5.710761
telephone other_debtors number_credits
4.716322 4.318972 3.974793
people_liable foreign_worker
3.196563 0.846520
contains the ranking of the features. While
lapply(zz, function(x) x$learner$state$model$importance$outtasklayout)
#output
[[1]]
id type
1: age integer
2: amount integer
3: credit_history factor
4: duration integer
5: employment_duration factor
6: installment_rate ordered
7: job factor
8: number_credits ordered
9: personal_status_sex factor
10: present_residence ordered
11: property factor
12: purpose factor
13: savings factor
14: status factor
[[2]]
id type
1: age integer
2: amount integer
3: credit_history factor
4: duration integer
5: employment_duration factor
6: housing factor
7: installment_rate ordered
8: job factor
9: personal_status_sex factor
10: present_residence ordered
11: property factor
12: purpose factor
13: savings factor
14: status factor
contains the features that were kept after the filter step.