I am running into trouble with variable importance in R. It prints the importance but does not include the variable name. I cannot figure out where it is getting the index in the left column. Below is the code and the output.
I have a set of data in the following form of except I have 192 variables and 10,000 observations. Columns 2-24 are continuous and the rest are categoric.
UPDATE: I have ran the same code without changing the categoric variables to factors. When calling varimp it now prints the corresponding variable names. Does anyone know why this is not working when I change the variables to categoric
Output X1 X2 X3 X4
0 2 50 44 22
1 3 40 33 11
1 2 50 22 10
0 1 42 12 18
my_data$Output[my_data$Output == "NA"] <- NA
#Converting Variables to Factors
my_data$Output <- factor(my_data$Output)
#Only use complete observations -- eliminate NA's
clean_data <- my_data[complete.cases(my_data),]
#Converts all columns to factors
clean_data[,25:189] = data.frame(apply(clean_data[,25:189], 2, as.factor))
#Split into testing and training
set.seed(7)
Data_Splitting <- createDataPartition(clean_data$Output,p=2/3,list=FALSE)
training = clean_data[Data_Splitting,]
testing = clean_data[-Data_Splitting,]
#Random Forest training
set.seed(7)
rf_train <- train(Output ~ ., data = training, method = "rf",
trControl = trainControl(method = "cv", number = 4, classProbs = T,
summaryFunction = twoClassSummary),
metric = "ROC")
#Plot of variable importance
varImp(rf_train)
plot(varImp(rf_train))
print(rf)
Overall
8 100.00,
23 99.80,
21 98.19,
2 94.17,
634 92.06,
7 91.75,
1010 81.26,
636 69.02,
9 56.88,
630 49.90,
1 42.60,
4 36.95,
16 29.34,
15 29.10,
1008 28.83,
17 28.54,
18 27.50,
22 27.04,
3 26.78,
14 26.36,