Questions tagged [tidymodels]

The tidymodels framework is a collection of R packages for modeling and machine learning using tidyverse principles.

The tidymodels framework is a "meta-package" for modeling and statistical analysis that shares the underlying design philosophy, grammar, and data structures of the . It includes a core set of packages that are loaded on startup, and extra packages that are installed along with tidymodels but not attached on startup. The tidymodels framework provides tooling for modeling tasks including supervised machine learning (predictive modeling), unsupervised machine learning, time series analysis, text analysis, and more.

Resources

Related tags

613 questions
31
votes
1 answer

What is the difference among prep/bake/juice in the R package "recipes"?

I read the introduction to tidymodels and I am confused about what prep(), bake() and juice() from the recipes package do to the data. What each does? I honestly found confusing to have such names for functions, what would be a more intuitive name…
Andrea
  • 529
  • 5
  • 10
11
votes
3 answers

retreiving tidy results from regression by group with broom

The answer to this question clearly explains how to retrieve tidy regression results by group when running a regression through a dplyr pipe, but the solution is no longer reproducible. How can one use dplyr and broom in combination to run a…
C. Rea
  • 121
  • 1
  • 7
9
votes
3 answers

Plotting decision tree results from tidymodels

I have managed to build a decision tree model using the tidymodels package but I am unsure how to pull the results and plot the tree. I know I can use the rpart and rpart.plot packages to achieve the same thing but I would rather use tidymodels as…
Edgar Zamora
  • 456
  • 3
  • 11
8
votes
1 answer

Create SHAP plots for tidymodel objects

This question refers to Obtaining summary shap plot for catboost model with tidymodels in R. Given the comment below the question, the OP found a solution but did not share it with the community so far. I want to analyze my tree ensembles fitted…
mugdi
  • 365
  • 5
  • 17
8
votes
2 answers

R Tidymodels: What objects to save for use in production after fitting a recipe-based workflow utilizing pre-processing?

After designing a Tidymodels recipe-based workflow, which is tuned then fitted to some training data, I'm not clear what objects (fitted "workflow", "recipe", ..etc) should be saved to disk for use in predicting new data in production. I understand…
wphampton
  • 504
  • 5
  • 13
8
votes
5 answers

Distance matrix to pairwise distance list in R

Is there any R package to obtain a pairwise distance list if my input file is a distance matrix For eg, if my input is a data.frame like this: A1 B1 C1 D1 A1 0 0.85 0.45 0.96 B1 0 0.85 …
Anurag Mishra
  • 1,007
  • 6
  • 16
  • 23
7
votes
1 answer

Tidymodels tune_grid: "Can't subset columns that don't exist" when not using formula

I've put together a data preprocessing recipe for the recent coffee dataset featured on TidyTuesday. My intention is to generate a workflow, and then from there tune a hyperparameter. I'm specifically interesting in manually declaring predictors and…
mdneuzerling
  • 388
  • 3
  • 9
7
votes
1 answer

tidymodels: ranger with cross validation

The dataset can be found here: https://www.kaggle.com/mlg-ulb/creditcardfraud I am trying to use tidymodels to run ranger with 5 fold cross validation on this dataset. I have have 2 code blocks. The first code block is the original code with the…
OTA
  • 269
  • 2
  • 17
6
votes
4 answers

How to apply t-test between ranges of columns in R

I have a large dataset that looks like this. I was wondering if there is a clever way to apply a t-test, in each row, aka gene, and compare the counts between humans and mice. I want to compete in each row (human_A,human_B,human_C) vs…
LDT
  • 2,856
  • 2
  • 15
  • 32
6
votes
1 answer

File size of tidymodels workflow

I'm trying to adopt tidymodels into my processes, but I'm running into a challenge with saving workflows. The file size for workflow objects is many times larger than the data used to build the model, so I end up maxing out my memory when trying to…
Jordo82
  • 796
  • 4
  • 14
6
votes
2 answers

tidymodels Novel levels found in column

I am using tidymodels to create a Random Forrest prediction. I have test data that contains a new factor level not present in the training data which results in the error: 1: Novel levels found in column 'Siblings': '4'. The levels have been…
Nivel
  • 629
  • 4
  • 12
6
votes
1 answer

running multiple regression models using tidymodels

I've recently been using tidymodels to run models and select parameters that best satisfy some objective function. For example using a hypothetical regression on mtcars data (using the regression examples from the bottom answer of this question as…
Robert Hickman
  • 869
  • 1
  • 6
  • 22
5
votes
1 answer

Tuning a LASSO model and predicting using tidymodels

I want to perform penalty selection for the LASSO algorithm and predict outcomes using tidymodels. I will use the Boston housing dataset to illustrate the problem. library(tidymodels) library(tidyverse) library(mlbench) data("BostonHousing") dt <-…
Augustin
  • 307
  • 2
  • 10
5
votes
1 answer

Why do we need prep, bake, and juice in tidymodels?

I always finish up my model to fit and predict without using prep(), bake(), or juice(): rec_wflow <- workflow() %>% add_model(lr_mod) %>% add_recipe(rec) data_fit <- rec_wflow %>% fit(data = train_data) Are these ( prep, bake,…
h-y-jp
  • 199
  • 1
  • 8
5
votes
1 answer

Add missing indicator columns using the tidymodels recipes package

I'd like to create a recipe using the recipes package that both imputes missing data and adds indicator columns that indicate which values were missing. It would also be nice if there was an option to choose between including an indicator column for…
Cameron Bieganek
  • 7,208
  • 1
  • 23
  • 40
1
2 3
40 41