I would like to know whether certain features or a different way of coding a features improves the performance of my model. I am aware if this discussion on feature selection with tidymodels and the colino package (previously recipeselectors
). However, I am not really interested in supervised or automatic feature selection. Instead, I would like to test the performance of different model formulas as part of the tuning process.
Essentially, I would like to do this (simplified example without any hyperparameters parameters):
recipe <- recipes::recipe(tune("formula"), data = data)
model <- linear_reg() %>% set_engine("lm")
workflow <- workflows::workflow() %>% add_recipe(lm_recipe) %>% add_model(lm_model)
tuning_grid <- expand.grid(formula = c(y ~ x1, y ~ x1_codingB, y ~ x1 + x2))
lm_tune <- tune_grid(workflow, resamples = data_cv, grid = tuning_grid, metrics = rmse)
Of course that doesn't work. recipes::recipe
expects a formula or a data.frame. So my questions are:
- Is there a different way to approach this problem with
tidymodels
? - Does it even make sense?