2

I constructed a predictive model using the GBM package in R. I have good results and I am able to see the feature importance list to see which variables are most important to the model. I am struggling with an editor's question asking for direction of the variables.

For instance: age variable: which age group is most important, rather than age overall?
region: which specific region, rather than region as a variable overall?

I see some implementation of this with LIME, however the GBM package is not compatible with LIME and I am stuggling with implementing it otherwise. Is there a manual way to see this?

My current idea to run the GBM model one by one and compare results. For instance, run with region A and all others the same, then region B, C, D, E, etc. Compare the final results and see more information about the level of each variable.

Does anyone have further advice or a quicker solution? Thanks

ClareFG
  • 65
  • 1
  • 11
  • not quite sure if this is a solution for you, but you might check for `gridsearch` functions. They are parameter optimizing functions for ML algorithms. They are also doing some kind of brute force optimizing, like your mentioned. Maybe it's faster. – mischva11 Mar 12 '20 at 10:01

1 Answers1

1

I suppose you are using gbm and not xgboost, but in any case you can always convert data into the necessary format.

You can try onehot encoding, and this is a bit better than testing the variables one by one because the model is exposed to all the variables. Below is not a very good example because I cut up a continuous variable, but hopefully in your model the categorization makes more sense:

library(MASS)
library(gbm)
library(highcharter)

data = Pima.te
age_cat = cut(data$age,4,labels = paste0("age",1:4))
onehot_bp = model.matrix(~0+age_cat)
data$type = as.numeric(data$type)-1
fit = gbm(type ~ .,data=cbind(data[,-grep("age",colnames(data))],onehot_bp))

res = summary(fit,plotit=FALSE)

hchart(res,"bar",hcaes(x=var,y=rel.inf,color=rel.inf))

enter image description here

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Thank you, this is a great solution for future projects! Since it is for a manuscript, not able to make any major changes for current project, >.<. Implementing step-by-step for each variable level will work for now, and this for future. – ClareFG Mar 12 '20 at 11:52