0

I am running into trouble with variable importance in R. It prints the importance but does not include the variable name. I cannot figure out where it is getting the index in the left column. Below is the code and the output.

I have a set of data in the following form of except I have 192 variables and 10,000 observations. Columns 2-24 are continuous and the rest are categoric.

UPDATE: I have ran the same code without changing the categoric variables to factors. When calling varimp it now prints the corresponding variable names. Does anyone know why this is not working when I change the variables to categoric

Output X1 X2 X3 X4
0      2  50 44 22
1      3  40 33 11
1      2  50 22 10
0      1  42 12 18

my_data$Output[my_data$Output == "NA"] <- NA

#Converting Variables to Factors
my_data$Output <- factor(my_data$Output)

#Only use complete observations -- eliminate NA's
clean_data <- my_data[complete.cases(my_data),]

#Converts all columns to factors
clean_data[,25:189] = data.frame(apply(clean_data[,25:189], 2, as.factor))

#Split into testing and training
set.seed(7)
Data_Splitting <- createDataPartition(clean_data$Output,p=2/3,list=FALSE)
training = clean_data[Data_Splitting,]
testing = clean_data[-Data_Splitting,]

#Random Forest training 
set.seed(7)
rf_train <- train(Output ~ ., data = training, method = "rf",
                  trControl = trainControl(method = "cv", number = 4, classProbs = T,
                                           summaryFunction = twoClassSummary),
                  metric = "ROC")

#Plot of variable importance 
varImp(rf_train)
plot(varImp(rf_train))
print(rf)

     Overall
8     100.00,
23     99.80,
21     98.19,
2      94.17,
634    92.06,
7      91.75,
1010   81.26,
636    69.02,
9      56.88,
630    49.90,
1      42.60,
4      36.95,
16     29.34,
15     29.10,
1008   28.83,
17     28.54,
18     27.50,
22     27.04,
3      26.78,
14     26.36,
Dustin Smith
  • 115
  • 1
  • 6

0 Answers0