5

I always finish up my model to fit and predict without using prep(), bake(), or juice():

rec_wflow <- 
  workflow() %>% 
  add_model(lr_mod) %>% 
  add_recipe(rec)

data_fit <- 
  rec_wflow %>% 
  fit(data = train_data)

Are these ( prep, bake, juice ) functions only used to visually check the preprocessing results of the data and not necessary for the fitting/training process?

What is the difference among prep/bake/juice in the R package "recipes"?

The above code is how I learned it in the official tutorial.

I've read in another blog that if you use train_data, data leakage is generated. I'd like to hear more about that; are these functions related to data leakage?

Julia Silge
  • 10,848
  • 2
  • 40
  • 48
h-y-jp
  • 199
  • 1
  • 8

1 Answers1

8

Short answer: you are correct, when a recipe is used in a workflow as in your example, the pre-processing functions are not required.

This is touched on in the tutorial Handle class imbalance in #TidyTuesday climbing expedition data with tidymodels:

We’re going to use this recipe in a workflow() so we don’t need to stress a lot about whether to prep() or not. If you want to explore the what the recipe is doing to your data, you can first prep() the recipe to estimate the parameters needed for each step and then bake(new_data = NULL) to pull out the training data with those steps applied.

I recommend all the tutorials at Julia's blog for understanding tidymodels.

neilfws
  • 32,751
  • 5
  • 50
  • 63
  • 1
    You have solved my question. Thanks, I'll read the blog as well to learn the finer points of the function. – h-y-jp Oct 19 '20 at 07:30
  • 2
    This is a great and helpful answer, and thank you for the kind words. Most often you don't need to use these functions when you use tidymodels workflows. I want to add that the functions like `prep()` and `bake()` can be used with base R, non-tidymodels modeling functions, [as outlined here](https://www.tmwr.org/recipes.html#recipes-manual). – Julia Silge Oct 19 '20 at 19:59