0

I have a dataset with multiple columns for the outcome variables that I would like to predict with the same preprocessing steps and models. Is there a way to run the same recipe and models (with tuning - I'm using workflow_map()) on multiple outcome variables (separate models for each outcome)?

Essentially, I want loop through the same preprocessing steps and models for each outcome. Basically I want to avoid having to do this:

model_recipe1 <- recipe(outcome_1 ~ ., data) %>%
                 step_1

model_recipe2 <- recipe(outcome_2 ~ ., data) %>%
                 step_1

model_recipe3 <- recipe(outcome_3 ~ ., data) %>%
                 step_1


and would instead like to do something like this:

model_recipe <- recipe(outcome[i] ~ ., data) %>%
                 step_1
nrjenkins
  • 13
  • 3
  • 1
    I haven't used `workflow_map()` but I guess it could be doing something random involving a seed. You could try adding `set.seed(123)` before any part of your modelling that could involve a random starting point. – stevec Jul 29 '22 at 16:57

2 Answers2

0

Try running this once before the rest of your code

set.seed(123)

If that doesn't solve it, try running this once at the start of your script:

addTaskCallback(function(...) {set.seed(123);TRUE})

Both of these methods try to ensure any random processes provide the same outcomes each time you run your script, allowing reproducibility.

stevec
  • 41,291
  • 27
  • 223
  • 311
  • Thanks for the comment, Stevec. I'm not worried about getting the same results each time the model is run, I just want to switch out the dependent variables. Essentially, I want the model to loop through each of my individual outcomes so that instead of copy and pasting the code for each outcome, I can just write it once and have some way of automatically changing the outcome. – nrjenkins Jul 29 '22 at 17:29
  • @nrjenkins gotcha. I'm not too familiar with workflows, so I'm not too sure. But hopefully someone with experience in them can provide some pointers. – stevec Jul 29 '22 at 17:36
0

I'm not sure if we 100% recommend the approach you are trying, but it will work in some circumstances:

library(tidymodels)

folds <- bootstraps(mtcars, times = 5)
wf_set <- workflow_set(list(mpg ~ ., wt ~ ., disp ~ .), list(linear_reg()))
workflow_map(wf_set, "fit_resamples", resamples = folds)
#> # A workflow set/tibble: 3 × 4
#>   wflow_id             info             option    result   
#>   <chr>                <list>           <list>    <list>   
#> 1 formula_1_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
#> 2 formula_2_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
#> 3 formula_3_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>

Created on 2022-08-04 by the reprex package (v2.0.1)

To make many recipes in an iterative fashion, you'll need a bit of metaprogramming such as with rlang. You can write a function to take (in this case) a string and create a recipe:

library(rlang)

my_recipe <- function(outcome) {
  form <- new_formula(ensym(outcome), expr(.))
  recipe(form, data = mtcars) %>%
    step_normalize(all_numeric_predictors())
}

And then you can use this function with purrr::map() across your outcomes:

library(tidymodels)
library(rlang)

folds <- bootstraps(mtcars, times = 5)

wf_set <- workflow_set(
  map(c("mpg", "wt", "disp"), my_recipe), 
  list(linear_reg())
  )

workflow_map(wf_set, "fit_resamples", resamples = folds)
#> # A workflow set/tibble: 3 × 4
#>   wflow_id            info             option    result   
#>   <chr>               <list>           <list>    <list>   
#> 1 recipe_1_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
#> 2 recipe_2_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
#> 3 recipe_3_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>

Created on 2022-08-04 by the reprex package (v2.0.1)

Julia Silge
  • 10,848
  • 2
  • 40
  • 48