When using the importance()
function on R's randomForest
you can get a list of the most important predictors.
I was wondering how to tell which predictors are associated with 1 of the specific binary outcomes? (i.e. which predictors are associated with disease outcomes and which predictors are associated with disease-free outcomes).
Here is my code to get the list of important predictors:
# Make a data frame with predictor names and their importance
imp_RF_model <- importance(RF_model)
imp_RF_model <- data.frame(predictors = rownames(imp_RF_model), imp_RF_model)
# Order the predictor levels by importance
imp_sort_RF_model <- arrange(imp_RF_model, desc(MeanDecreaseGini))
imp_sort_RF_model$predictors <- factor(imp_sort_RF_model$predictors, levels = imp_sort_RF_model$predictors)
# Select the top 20 predictors
imp_20_RF_model <- imp_sort_RF_model[1:20, ]
For example, if protein A is a strong predictor, I want to know if high levels of protein A are associated with the disease, or if high levels of protein A are associated with disease-free samples. So I want to know if the predictor is inversely associated with the disease or directly associated with the disease.