1

I made a model:

model<-xgboost(data=as.matrix(data[,-1]),label=data$Ethnicity, num_class=8, nrounds=50,objective="multi:softmax",lambda=1, eval_metric="merror")

data is a matrix of 94 variables of random survey question and Ethnicity is a 0-7 variable coding race/ethnicity so that every number from 0 to 7 represents an ethnicity.

I found which variables are most important in the prediction:

xgb.importance(model=model)

I get:

##     Feature         Gain        Cover   Frequency
##  1:     q97 0.0924173556 0.0388402250 0.016981237
##  2:      q9 0.0603595554 0.0199381316 0.012749847
##  3:      q7 0.0456855077 0.0447756304 0.066922777
##  4:      q6 0.0436987577 0.0485072162 0.041311731
##  5:      q8 0.0319606309 0.0212999077 0.015199599
##  6:     q99 0.0276115402 0.0201090242 0.007961695
##  7:     q89 0.0245865711 0.0249913356 0.023829408
##  8:     q13 0.0197648132 0.0190748590 0.010912533
##  9:     q81 0.0194462208 0.0140010066 0.021880742
## 10:     q71 0.0192126872 0.0194684164 0.019709370

However as you can see my response is multi-class, i.e. all 7 ethnicity groups at once. My question is how do I use plots to show how each ethnic group respond to each variable identified or to fit individual models to see what variables are important for each ethnicity in my above xgb.importance? TIA!

  • I don't think there is a straightforward way to do this in R. What you can do is do it for each class vs the rest of the classes with the help of a loop. Not exactly similar but you can find some related answers here: https://stackoverflow.com/questions/29637145/gbm-r-function-get-variable-importance-separately-for-each-class – Shibaprasadb Oct 20 '21 at 11:52
  • hey mate the link you sent was on glm not xgboost. –  Oct 20 '21 at 19:22
  • Yes. I was pointing out the method. Can you share your sample data with ```dput()```? Or with an attached link in the question? Then we will be in a better position to help you out. – Shibaprasadb Oct 21 '21 at 05:20

0 Answers0