I'm building a Random Forrest Classifier and I would like to return classification and associated probabilities. My result variable is either 1
or 0
, 1 being the positive class that I want to track.
no_of_trees <- 50
rf.under <- randomForest(as.factor(result) ~ . ,
data=data_balanced_under,
importance=TRUE,
ntree=no_of_trees)
prediction <- predict(rf.under, df.test)
probability <- predict(rf.under, df.test, type="prob")
submit <- data.frame( predicted = prediction, actual = df.test$result)
I wanted probability to return the probability of positive results, however I get:
> probability
0 1
242339 1.00 0.00
3356431 1.00 0.00
138327 1.00 0.00
111327 1.00 0.00
3307151 1.00 0.00
222414 1.00 0.00
1817297 1.00 0.00
3860922 1.00 0.00
1710532 1.00 0.00
in my output. What are these numbers on the left? I'm not sure what they are? I thought they are row numbers, but then, why aren't they indexed from 1,2,3..?
I tied to get probability[,2]
which I'm assuming gives me probability of the result, but also doesn't work.
Ideally, I would like to include the probabilities in the submit
data frame, but currently unable to do so.
Also, confusion matrix gives me:
confusionMatrix(data = submit$predicted, reference = df.test$result , positive="1")
#Reference
Prediction 0 1
0 913730 160
1 50872 8219
Is it possible to switch this around? So that it shows positive class "1" first?