Questions tagged [r-recipes]

recipes is an R package by Max Kuhn and Hadley Wickham for creating and preprocessing design matrices.

recipes is an R package by Max Kuhn and Hadley Wickham for creating and preprocessing design matrices.

131 questions
31
votes
1 answer

What is the difference among prep/bake/juice in the R package "recipes"?

I read the introduction to tidymodels and I am confused about what prep(), bake() and juice() from the recipes package do to the data. What each does? I honestly found confusing to have such names for functions, what would be a more intuitive name…
Andrea
  • 529
  • 5
  • 10
8
votes
2 answers

R Tidymodels: What objects to save for use in production after fitting a recipe-based workflow utilizing pre-processing?

After designing a Tidymodels recipe-based workflow, which is tuned then fitted to some training data, I'm not clear what objects (fitted "workflow", "recipe", ..etc) should be saved to disk for use in predicting new data in production. I understand…
wphampton
  • 504
  • 5
  • 13
7
votes
1 answer

Tidymodels tune_grid: "Can't subset columns that don't exist" when not using formula

I've put together a data preprocessing recipe for the recent coffee dataset featured on TidyTuesday. My intention is to generate a workflow, and then from there tune a hyperparameter. I'm specifically interesting in manually declaring predictors and…
mdneuzerling
  • 388
  • 3
  • 9
5
votes
1 answer

Why do we need prep, bake, and juice in tidymodels?

I always finish up my model to fit and predict without using prep(), bake(), or juice(): rec_wflow <- workflow() %>% add_model(lr_mod) %>% add_recipe(rec) data_fit <- rec_wflow %>% fit(data = train_data) Are these ( prep, bake,…
h-y-jp
  • 199
  • 1
  • 8
5
votes
1 answer

Add missing indicator columns using the tidymodels recipes package

I'd like to create a recipe using the recipes package that both imputes missing data and adds indicator columns that indicate which values were missing. It would also be nice if there was an option to choose between including an indicator column for…
Cameron Bieganek
  • 7,208
  • 1
  • 23
  • 40
5
votes
1 answer

Why is recipes 20x slower than handmade pretreatment while training a caret model?

In order to build a stacking model, I trained many base models using different pretreatments on the same dataset. In order to keep track of the way to build the design matrices I used the recipes package and defined my own steps. But using a recipe…
denisC
  • 53
  • 3
5
votes
1 answer

Difference in preprocessing using recipes and caret's preProcess

I have been exploring the new recipes package for variable transformations as part of a machine learning pipeline. I opted for this approach - upgrading from using caret's preProcess function, due to all the new extensions. But I am finding that the…
Hanjo Odendaal
  • 1,395
  • 2
  • 13
  • 32
4
votes
1 answer

R package or function to record filters applied to your tibble

Does there exist any R function or packages that records the operations applied to a tibble/data frame? For example, if I did the following data(iris) my_table <- iris %>% filter(Sepal.Length>6) %>% filter(Species == 'virginica') I would want the…
RayVelcoro
  • 524
  • 6
  • 21
4
votes
1 answer

How to set the parameters grids correctly when tuning the workflowset with tidymodels?

I try to use tidymodels to tune the workflow with recipe and model parameters. When tuning a single workflow there is no problem. But when tuning a workflowsets with several workflows it always fails. Here is my codes: # read the training data train…
Kim.L
  • 121
  • 10
4
votes
1 answer

predict.train vs predict using recipe objects

After specifiying a recipe to use in caret::train I am trying to predict new samples. I have a couple of questions around this as I can not find in caret/recipes documentation. Should I use predict() or predict.train()? Whats the difference? Should…
JFG123
  • 577
  • 5
  • 13
3
votes
1 answer

Create a multivariate matrix in tidymodels recipes::recipe()

I am trying to do a k-fold cross validation on a model that predicts the joint distribution of the proportion of tree species basal area from satellite imagery. This requires the use of the DiricihletReg::DirichReg() function, which in turn…
Sean McKenzie
  • 707
  • 3
  • 13
3
votes
1 answer

Stepwise Algorithm in Tidymodels

I found that the Stepwise Algorithm for variable selection implemented natively in R with step() is not integrated in Tidymodels. I do not know if there is a reason for not using it (because of better procedures), or is it simply a lacking feature.
Marco Repetto
  • 336
  • 2
  • 15
3
votes
2 answers

Is there a way to group rows (especially dummy variables) in the recipes package in R (or ml3)

# Packages library(dplyr) library(recipes) # toy dataset, with A being multicolored df <- tibble(name = c("A", "A", "A", "B", "C"), color = c("green", "yellow", "purple", "green", "blue")) #> # A tibble: 5 x 2 #> name color #> …
JeromeLaurent
  • 327
  • 3
  • 10
3
votes
0 answers

Recipe vs Formula vs X/Y Interface reproducibility for gbm with caret

I have trained the same model on the iris data set to investigate the reproducibility of each method. It seems that there is a discrepency between models when using all.equal() for the models trained with the recipes interface, but not with the…
JFG123
  • 577
  • 5
  • 13
3
votes
1 answer

Write your own tidyselect functions

I wrote an R package that utilizes the {tidyselect} selectors (e.g. contains(), starts_with(), etc.). I would like to add a few more select helper functions to the package to select variables based on some attribute. For example, select all…
Daniel D. Sjoberg
  • 8,820
  • 2
  • 12
  • 28
1
2 3
8 9