After specifiying a recipe to use in caret::train I am trying to predict new samples. I have a couple of questions around this as I can not find in caret/recipes documentation.
- Should I use predict() or predict.train()? Whats the difference?
- Should I bake the test data with the prepared recipe first before using predict? When using preProcess directly in train() you are advised not to preProcess new data as the train object will automatically do that. Is this the same when using recipes?
Below is a reproducible example illustrating my process and the difference in predictions when using predict vs predict.train
library(recipes)
library(caret)
# Data ----
data("credit_data")
credit_train <- credit_data[1:3500,]
credit_test <- credit_data[-(1:3500),]
# Set up recipe ----
set.seed(0)
Rec.Obj = recipe(Status ~ ., data = credit_train) %>%
step_knnimpute(all_predictors()) %>%
step_center(all_numeric())%>%
step_scale(all_numeric())
# Control parameters ----
set.seed(0)
TC = trainControl("cv",number = 10, savePredictions = "final", classProbs = TRUE, returnResamp = "final")
set.seed(0)
Model.Output = train(Rec.Obj,
credit_train,
trControl = TC,
tuneLength = 1,
metric = "Accuracy",
method = "glm")
# Preped recipe ----
set.seed(0)
prep.rec <-
prep(Rec.Obj, newdata = credit_train)
# Baked data for observation ----
set.seed(0)
bake.train <- bake(prep.rec, new_data = credit_train)
bake.test <- bake(prep.rec, new_data = credit_test)
# investigation of prediction methods ----
# no application of recipe to newdata
set.seed(0)
predict.norm = predict(Model.Output, credit_test, type = "raw")
predict.train = predict.train(Model.Output, credit_test, type = "raw")
identical(predict.norm,predict.train)
# evaluates to FALSE
# Apply recipe to new data (bake.test)
predict.norm.baked = predict(Model.Output, bake.test, type = "raw")
predict.train.baked = predict.train(Model.Output, bake.test, type = "raw")
identical(predict.norm.baked, predict.train.baked)
# evaluates to FALSE
# Comparison of both predict() funcs
identical(predict.norm, predict.norm.baked)
# evaluates to FALSE