I have a nested GBM, and am looking to extract the partial depndence, tryingto use the following query:
library(rsample) # data splitting
library(gbm) # basic implementation
library(xgboost) # a faster implementation of gbm
library(caret) # an aggregator package for performing many machine learning models
library(h2o) # a java-based platform
library(pdp) # model visualization
basic_gbm <- function(data) {
mymodel <- gbm(formula = mpg ~ . ,
distribution = "gaussian",
data = data ,
n.minobsinnode = 1,
bag.fraction = 1
)
return(mymodel)
}
blah_model <- mtcars %>%
group_by() %>%
nest() %>%
mutate(model = map(data, basic_gbm))
blah_summary <- mtcars %>%
group_by() %>%
nest() %>%
mutate(model = map(data, basic_gbm)) %>%
mutate(summary = map(model, summary)) %>%
mutate(all_data = pmap(list(data, summary), .f =left_join, by = character())) %>%
select(cols=c(all_data)) %>%
unnest(cols = c(cols)) %>%
ungroup()
blah_model %>%
left_join(blah_summary, by = character()) %>%
mutate(pred = map(model, partial, pred.var = var, n.trees = model$n.trees, train = data)) -- this does not work
This does work and is what I would want as a nested df for each var:
coeffs <- blah_model$model[[1]] %>%
partial(pred.var = 'disp', n.trees = blah_model$model[[1]]$n.trees, train = blah_model$data[[1]])
However, it is saying it is not finding the variables in the training data - the data I am passing through is the training data. The var in the map is from the summary functions - these are prediction variables.
I gave a better example