0

I'm trying to do a PCR with tidymodels however i'm keep runing into this problem. I know there is a similar post but the solution over there, doesn't work form my case.

My data

library(AppliedPredictiveModeling)
data(solubility)

train = solTrainY %>% bind_cols(solTrainXtrans) %>% rename(solubility = ...1)

My PCR analysis

train %<>% mutate_all(., as.numeric) %>% glimpse()
tidy_rec = recipe(solubility ~ ., data = train) %>%
  step_corr(all_predictors(), threshold = 0.9) %>%
  step_pca(all_predictors(), num_comp = ncol(train)-1) %>% 
  prep()

tidy_rec %>% tidy(2) %>% select(terms) %>% distinct()

tidy_predata = tidy_rec %>% juice()

# Re-sampling
tidy_folds = vfold_cv(train, v = 10)

# Set model
tidy_rlm = linear_reg() %>% 
  set_mode("regression") %>% 
  set_engine("lm")

# Set workflow
tidy_wf = workflow() %>% 
  add_recipe(tidy_rec) %>% 
  add_model(tidy_rlm) 

# Fit model
tidy_fit = tidy_wf %>% 
  fit_resamples(tidy_folds) 

tidy_fit %>% collect_metrics()

Error

x Fold01: recipe: Error: Can't subset columns that don't exist.
x Columns `PC1`, `PC2`, `PC3`, `PC4`, and `PC5` don't exist.
x Fold02: recipe: Error: Can't subset columns that don't exist.
x Columns `PC1`, `PC2`, `PC3`, `PC4`, and `PC5` don't exist.
x Fold03: recipe: Error: Can't subset columns that don't exist.
x Columns `PC1`, `PC2`, `PC3`, `PC4`, and `PC5` don't exist.
x Fold04: recipe: Error: Can't subset columns that don't exist.
x Columns `PC1`, `PC2`, `PC3`, `PC4`, and `PC5` don't exist.
x Fold05: recipe: Error: Can't subset columns that don't exist.
x Columns `PC1`, `PC2`, `PC3`, `PC4`, and `PC5` don't exist.
x Fold06: recipe: Error: Can't subset columns that don't exist.
.
.
.
Ian.T
  • 1,016
  • 1
  • 9
  • 19
  • Does this answer your question? [Subsetting with $ when column may not exist](https://stackoverflow.com/questions/27138507/subsetting-with-when-column-may-not-exist) – N. Kiefer Sep 20 '20 at 09:20

1 Answers1

1

It is because workflow needs a recipe specification that is not prepped.

So, in your code, removing the prep() from the recipe specification will eliminate the error.

tidy_rec <- recipe(solubility ~ ., data = train) %>%
  step_corr(all_predictors(), threshold = 0.9) %>%
  step_pca(all_predictors(), num_comp = ncol(train)-1) 
  # remove the prep() method
hnagaty
  • 796
  • 5
  • 13