4

I try to use tidymodels to tune the workflow with recipe and model parameters. When tuning a single workflow there is no problem. But when tuning a workflowsets with several workflows it always fails. Here is my codes:

# read the training data
train <- read_csv("../../train.csv")
train <- train %>% 
    mutate(
      id = row_number(),
      across(where(is.double), as.integer),
      across(where(is.character), as.factor),
      r_yn = fct_relevel(r_yn, "yes")) %>% 
  select(id, r_yn, everything())

# setting the recipes

# no precess
rec_no <- recipe(r_yn ~ ., data = train) %>%
  update_role(id, new_role = "ID")

# downsample: tuning the under_ratio
rec_ds_tune <- rec_no %>% 
  step_downsample(r_yn, under_ratio = tune(), skip = TRUE, seed = 100) %>%
  step_nzv(all_predictors(), freq_cut = 100)

# setting the models

# randomforest
spec_rf_tune <- rand_forest(trees = 100, mtry = tune(), min_n = tune()) %>%
  set_engine("ranger", seed = 100) %>%
  set_mode("classification")

# xgboost
spec_xgb_tune <- boost_tree(trees = 100, mtry = tune(), tree_depth = tune(), learn_rate = tune(), min_n = tune()) %>% 
   set_engine("xgboost") %>% 
   set_mode("classification")

# setting the workflowsets
wf_tune_list <- workflow_set(
  preproc = list(no = rec_no, ds = rec_ds_tune),
  models = list(rf = spec_rf_tune, xgb = spec_xgb_tune),
  cross = TRUE)

# finalize the parameters, I'm not sure it is correct or not
rf_params <- spec_rf_tune %>% parameters() %>% update(mtry = mtry(c(1, 15)))
xgb_params <- spec_xgb_tune %>% parameters() %>% update(mtry = mtry(c(1, 15)))
ds_params <- rec_ds_tune %>% parameters() %>% update(under_ratio = under_ratio(c(1, 5)))

wf_tune_list_finalize <- wf_tune_list %>% 
  option_add(param = ds_params, id = c("ds_rf", "ds_xgb")) %>% 
  option_add(param = rf_params, id = c("no_rf", "ds_rf")) %>% 
  option_add(param = xgb_params, id = c("no_xgb", "ds_xgb"))

I check the option in wf_tune_list_finalize it shows:

> wf_tune_list_finalize$option
[[1]]
a list of options with names:  'param'

[[2]]
a list of options with names:  'param'

[[3]]
a list of options with names:  'param'

[[4]]
a list of options with names:  'param'

Then I tune this workflowset:

# tuning the workflowset
cl <- makeCluster(detectCores())
registerDoParallel(cl)
wf_tune_race <- wf_tune_list_finalize %>%
  workflow_map(fn = "tune_race_anova",
               seed = 100,
               resamples = cv_5,
               grid = 3,
               metrics = metric_auc,
               control = control_race(parallel_over = "everything"), 
               verbose = TRUE)
stopCluster(cl)

The verbose messages shows that there is something wrong with my parameters in the workflow ds_rf and ds_xgb:

i 1 of 4 tuning:     no_rf
i Creating pre-processing data to finalize unknown parameter: mtry
�� 1 of 4 tuning:     no_rf (1m 44.4s)
i 2 of 4 tuning:     no_xgb
i Creating pre-processing data to finalize unknown parameter: mtry
�� 2 of 4 tuning:     no_xgb (28.9s)
i 3 of 4 tuning:     ds_rf
x 3 of 4 tuning:     ds_rf failed with: Some tuning parameters require finalization but there are recipe parameters that require tuning. Please use `parameters()` to finalize the parameter ranges.
i 4 of 4 tuning:     ds_xgb
x 4 of 4 tuning:     ds_xgb failed with: Some tuning parameters require finalization but there are recipe parameters that require tuning. Please use `parameters()` to finalize the parameter ranges.

The result is:

> wf_tune_race
# A workflow set/tibble: 4 x 4
  wflow_id info             option      result        
  <chr>    <list>           <list>      <list>        
1 no_rf    <tibble [1 x 4]> <wrkflw__ > <race[+]>     
2 no_xgb   <tibble [1 x 4]> <wrkflw__ > <race[+]>     
3 ds_rf    <tibble [1 x 4]> <wrkflw__ > <try-errr [1]>
4 ds_xgb   <tibble [1 x 4]> <wrkflw__ > <try-errr [1]>

What's more, although the no_rf and no_xgb have tuning results, I find that the range of mtry in these two workflows is not the range I set above, that means the parameters range setting step is totally fail. I have followed the tutorials from https://www.tmwr.org/workflow-sets.html and https://workflowsets.tidymodels.org/ but still have no ideas.

So how to set both the recipe and model parameters correctly when tuning workflowsets?

The train.csv in my code is here: https://github.com/liuyifeikim/Some-data

Kim.L
  • 121
  • 10
  • Following this post:https://www.tidyverse.org/blog/2021/03/workflowsets-0-0-1/,I replace **param** with **param_info** in **option_add()**, after that, the range of **mtry** in **no_rf** and **no_xgb** is in accordance with my setting(1 to 15), but **ds_rf** and **ds_xgb** still fail, is there something wrong with **rec_ds_tune**? – Kim.L Aug 01 '21 at 10:11
  • I believe this is a bug that was fixed in the recent CRAN release of finetune. Can you make sure you are using the version that was just released (or install from GitHub) and try again? – Julia Silge Aug 02 '21 at 03:54
  • @JuliaSilge Thank you, I have updated the packages and tried again (finetune = 0.10, tune = 0.1.6, workflowsets = 0.1.0), but maybe it is not the problem of finetune, I consider there is something wrong with my setting of **option_add()**, I find the order of **option_add()** will affect the result, if I try `wf_tune_list %>% option_add(param_info = ds_params, id = "ds_rf") %>% option_add(param_info = rf_params, id = "ds_rf") `, the **rf_params** will cover the **ds_params**, I still have no idea about how to add two cunstom parameter settings to the same workflow in a workflowset? – Kim.L Aug 02 '21 at 06:14
  • Hmmmm, if you can create a small [reprex](https://rstd.io/reprex) and post this problem on the [workflowsets repo](https://github.com/tidymodels/workflowsets/issues), that would be very helpful. – Julia Silge Aug 02 '21 at 21:31

1 Answers1

2

I have modified the parameter setting step, and the tuning result is correct now:

# setting the parameters on each workflow seperately
no_rf_params <- wf_set_tune_list %>% 
  extract_workflow("no_rf") %>% 
  parameters() %>% 
  update(mtry = mtry(c(1, 15)))

no_xgb_params <- wf_set_tune_list %>% 
  extract_workflow("no_xgb") %>% 
  parameters() %>% 
  update(mtry = mtry(c(1, 15)))

ds_rf_params <- wf_set_tune_list %>% 
  extract_workflow("ds_rf") %>% 
  parameters() %>% 
  update(mtry = mtry(c(1, 15)), under_ratio = under_ratio(c(1, 5)))

ds_xgb_params <- wf_set_tune_list %>% 
  extract_workflow("ds_xgb") %>% 
  parameters() %>% 
  update(mtry = mtry(c(1, 15)), under_ratio = under_ratio(c(1, 5)))

# update the workflowset
wf_set_tune_list_finalize <- wf_set_tune_list %>% 
  option_add(param_info = no_rf_params, id = "no_rf") %>%
  option_add(param_info = no_xgb_params, id = "no_xgb") %>% 
  option_add(param_info = ds_rf_params, id = "ds_rf") %>% 
  option_add(param_info = ds_xgb_params, id = "ds_xgb")

The rest remains the same. I think there may be some efficient ways to set the parameters.

Kim.L
  • 121
  • 10
  • I tried to use parts of your but I get Warning message: `parameters.workflow()` was deprecated in tune 0.1.6.9003. Please use `hardhat::extract_parameter_set_dials()` instead. – Marc Kees May 01 '22 at 18:03