3

Code Example:

# BLOCKING by "userID" 
task$col_roles$group = "userID"
# Remove "userID" from features
task$col_roles$feature = setdiff(task$col_roles$feature, "userID")

# STRATIFICATION (by Target Variable!)
task$col_roles$stratum = "answer_code"

# Instantiate the resampling on the task:
rsmp_scheme$instantiate(task)

Problem: Trying to combine the resampling procedures "Stratified resampling" (stratified by target) and 2) "Block resampling" (grouping by userID) (see mlr3 gallery example: https://mlr3gallery.mlr-org.com/posts/2020-03-30-stratification-blocking/) , the following error occurs:

Error: Cannot combine stratification with grouping

Background Information:

  • My data set includes several repeated measurements per user (different number available repeated measurements per person) -> therefore, blocking or grouping per userID would be appropriate.
  • In addition, the distribution of the target variable is very imbalanced, which is why a stratification by target would be appropriate.

Question: How can I implement both resampling methods in mlr3?

Thanks for your help! :-)

Ana
  • 115
  • 6
  • 1
    As the error states (and the [code shows](https://github.com/mlr-org/mlr3/blob/6a35c8589a450ba959a0ed77146af06cafe35061/R/Resampling.R#L177)), you cannot combine both strategies. Please also use a reproducible example next time. – pat-s Jun 22 '21 at 14:51
  • Thanks for your quick answer! Which of the two resampling strategies would you recommend to apply on/ consider as most important for the described data situation? – Ana Jun 22 '21 at 15:32
  • 1
    @Ana best to create the folds manually which will respect both blocking and stratification – missuse Jun 22 '21 at 16:23
  • @Ana One cannot answer this question as this is really specific to your data and should ultimately answered by you. @missuse is right, see `?ResamplingCustom` for custom fold creation. – pat-s Jun 22 '21 at 16:32
  • 1
    @pat-s thanks for your help! Unfortunately, I am quite new to mlr3 and really struggling with implementing a customized resampling, combining stratification & blocking based on the explanations provided in the mlr3book for "Custom Resampling" [link](https://mlr3book.mlr-org.com/resampling.html). Any possibility that a concrete code example is provided for this use case in the near future, e.g. in the mlr3 gallery example for "Resampling Strategies" [link](https://mlr3gallery.mlr-org.com/posts/2020-03-30-stratification-blocking/)? – Ana Jun 25 '21 at 09:41
  • @Ana Good to hear. Unfortunately not, this is quite a niche case and I have never dealt with this myself. Maybe you can decide for one out of the two? Given that your full dataset would be needed for this, it's quite hard to help here. Sorry for not being more helpful here :/ – pat-s Jun 25 '21 at 12:58
  • @Ana one simple approach to create block resampling that will also respect stratification is the brute force way. Instantiate the resamples using different seeds and check the stratification in each, then just pick the one that is best. – missuse Sep 15 '21 at 10:10

0 Answers0