I made a model:
model<-xgboost(data=as.matrix(data[,-1]),label=data$Ethnicity, num_class=8, nrounds=50,objective="multi:softmax",lambda=1, eval_metric="merror")
data is a matrix of 94 variables of random survey question and Ethnicity is a 0-7 variable coding race/ethnicity so that every number from 0 to 7 represents an ethnicity.
I found which variables are most important in the prediction:
xgb.importance(model=model)
I get:
## Feature Gain Cover Frequency
## 1: q97 0.0924173556 0.0388402250 0.016981237
## 2: q9 0.0603595554 0.0199381316 0.012749847
## 3: q7 0.0456855077 0.0447756304 0.066922777
## 4: q6 0.0436987577 0.0485072162 0.041311731
## 5: q8 0.0319606309 0.0212999077 0.015199599
## 6: q99 0.0276115402 0.0201090242 0.007961695
## 7: q89 0.0245865711 0.0249913356 0.023829408
## 8: q13 0.0197648132 0.0190748590 0.010912533
## 9: q81 0.0194462208 0.0140010066 0.021880742
## 10: q71 0.0192126872 0.0194684164 0.019709370
However as you can see my response is multi-class, i.e. all 7 ethnicity groups at once. My question is how do I use plots to show how each ethnic group respond to each variable identified or to fit individual models to see what variables are important for each ethnicity in my above xgb.importance? TIA!