I want to train a model via tidymodels
using predictions from another model as feature. Specifically it`s a KNN model where I want to use predictions from a random forest model as a feature.
I started implementing a (hacky) solution using step_mutate
, here it is:
library(dplyr)
library(tidymodels)
library(purrr)
library(data.table)
df <- data.table(
y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100)
)
pred_rf <- function(...) {
# Very hacky function which creating random_forest predictions
nms <- purrr::map_chr(rlang::enexprs(...), as.character)
l <- list(...)
dat <- setDT(l)
outcome <- names(dat)[1]
preds <- names(dat)[-1]
rec <- recipe(dat) %>%
update_role(!!outcome, new_role = "outcome") %>%
update_role(!!preds, new_role = "predictor")
model <- rand_forest(mode = "regression")
wf <- workflow() %>%
add_recipe(rec) %>%
add_model(model)
fitted_model <- fit(wf, dat)
predictions <- predict(fitted_model, dat)$.pred
stopifnot(length(predictions) == nrow(dat))
stopifnot(sum(is.na(predictions)) == 0)
return(predictions)
}
rec <- recipe(y ~ ., df) %>%
step_mutate(y_pred = pred_rf(y, x1, x2)) %>%
prep()
bake(rec, new_data = NULL) # Desired output would be a design matrix like this
However I realised that would cause data-leakage when used for tuning. Is this possible to do without data leakage or would I need to create a custom step? It would be very similar to the step_impute_*
functions, but I couldn`t find anything.
Thanks