0

I have a nested GBM, and am looking to extract the partial depndence, tryingto use the following query:

library(rsample)      # data splitting 
library(gbm)          # basic implementation
library(xgboost)      # a faster implementation of gbm
library(caret)        # an aggregator package for performing many machine learning models
library(h2o)          # a java-based platform
library(pdp)          # model visualization

    basic_gbm <- function(data) {
  mymodel <- gbm(formula = mpg  ~ . , 
      distribution = "gaussian",
      data = data , 
      n.minobsinnode = 1,
      bag.fraction = 1
      )
  return(mymodel)
  
}

blah_model <- mtcars %>%
  group_by() %>%
  nest() %>%
  mutate(model = map(data, basic_gbm))

blah_summary <- mtcars %>%
  group_by() %>%
  nest() %>%
  mutate(model = map(data, basic_gbm)) %>%
  mutate(summary = map(model, summary)) %>%
  mutate(all_data = pmap(list(data, summary), .f =left_join, by = character())) %>%
  select(cols=c(all_data)) %>% 
  unnest(cols = c(cols)) %>%
  ungroup()

blah_model %>%
  left_join(blah_summary, by = character()) %>%
  mutate(pred = map(model, partial, pred.var = var, n.trees = model$n.trees, train = data)) -- this does not work

This does work and is what I would want as a nested df for each var:

coeffs <- blah_model$model[[1]] %>%
    partial(pred.var = 'disp', n.trees = blah_model$model[[1]]$n.trees, train = blah_model$data[[1]])

However, it is saying it is not finding the variables in the training data - the data I am passing through is the training data. The var in the map is from the summary functions - these are prediction variables.

I gave a better example

  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick May 07 '21 at 04:37
  • Thanks, it is a little hard to share, but the table is the following effectively: matrix(c('BUI', 'BUI', 'data', 'model', 'Age'), nrow = 1, ncol = 5, byrow = TRUE, dimnames = list(c(), c('f_risk_code', 'f_risk_category', 'data', 'model', 'var'))) The data column is a tiblle that has been nested, and the model is a standard GBM model that has been nested. The var is a predictive variable from the GBM – Mathew Lionnet May 07 '21 at 04:58
  • That last line that you says "does work" doesn't work for me. I get the error "Error: `.f` must be a function, not a `gbm` object". Which `partial` function are you trying to use? The one from `purrr`? Because the first parameter to that should be a function. But there is no function to be found in your current code. And where does `model` come from in `model$n.trees`? Is that supposed to be `blah_model$model[[1]]$n.trees`? – MrFlick May 07 '21 at 06:04
  • You are correct, apologies that last line should be coeffs <- blah_model$model[[1]] %>% partial(pred.var = 'disp', n.trees = blah_model$model[[1]]$n.trees, train = blah_model$data[[1]]) – Mathew Lionnet May 10 '21 at 23:20
  • I have also added the libraries – Mathew Lionnet May 10 '21 at 23:22

0 Answers0