0

I would like to know whether certain features or a different way of coding a features improves the performance of my model. I am aware if this discussion on feature selection with tidymodels and the colino package (previously recipeselectors). However, I am not really interested in supervised or automatic feature selection. Instead, I would like to test the performance of different model formulas as part of the tuning process.

Essentially, I would like to do this (simplified example without any hyperparameters parameters):

recipe <- recipes::recipe(tune("formula"), data = data)
model    <- linear_reg() %>% set_engine("lm")
workflow <- workflows::workflow() %>% add_recipe(lm_recipe) %>% add_model(lm_model)
tuning_grid <- expand.grid(formula = c(y ~ x1, y ~ x1_codingB, y ~ x1 + x2))
lm_tune <- tune_grid(workflow, resamples = data_cv, grid = tuning_grid, metrics = rmse)

Of course that doesn't work. recipes::recipe expects a formula or a data.frame. So my questions are:

  1. Is there a different way to approach this problem with tidymodels?
  2. Does it even make sense?
user2503795
  • 4,035
  • 2
  • 34
  • 49

1 Answers1

1

Sure, it absolutely makes sense.

Depending on what you are doing, workflowsets are a good thing to check out. Basically, you can matrix combinations of models and preprocessors, the latter includes formulas.

So if you have different predictor sets, you can manually create them or use a package like formula.tools to make a list of leave-one-out formulas, and so on.

There is a chapter on workflowsets in the tidymodels book but that is mostly looking at running many models on a set of features/predictors.

I'll work on making some content for tidymodels.org for what you want. If you need more help, maybe add a github issue about your specific needs.

topepo
  • 13,534
  • 3
  • 39
  • 52