I am pretty new to machine learning, and I've stumbled upon an issue and can't seem to find a solution no matter how hard I google.
I have performed a multiclass classification procedure using a randomForest
algorithm and found a model that offers adequate prediction of my test sample. I then used varImpPlot()
to determine which predictors are most important to the determining the class assignments.
My problem: I would like to know why those predictors are most important. Specifically, I would like to be able to report that cases that fall into Class X hold Characteristics A (e.g., are male), B (e.g., are older), and C (e.g., have high IQ), while cases that fall into Class Y hold Characteristics D (female), E (younger), and F (low IQ), and so on for the rest of my classes.
I know that standard binary logistic regression allows you to say that cases with high values on Characteristic A are more likely to fall into class X, for example. So, I was hoping for something conceptually similar, but from a random forest classification model on multiple classes.
Is this a thing that can be done using random forest models? If yes, is there a function in randomForest
or in caret
(or even elsewhere) that can help me get past the varImpPlot()
and varImp()
table?
Thanks!