1

I am trying to get the class probability of a binary classification of a randomForest. I am struggling to get the right syntax. I have tried to read the help file but I have not found the answer. Any ideas?

> str(training)
'data.frame':   160051 obs. of  5 variables:
 $ repeater           : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 1 1 1 1 1 1 ...
 $ offervalue         : num  0.75 0.75 1.5 0.75 1.25 1.25 1 0.75 0.75 0.75 ...
 $ has_bought_brand   : Factor w/ 2 levels "FALSE","TRUE": 1 1 2 1 1 1 2 1 1 1 ...
 $ has_bought_company : Factor w/ 2 levels "FALSE","TRUE": 1 1 2 1 2 2 2 2 1 1 ...
 $ has_bought_category: Factor w/ 2 levels "FALSE","TRUE": 2 1 1 1 2 2 2 1 1 1 ...

> model <- randomForest(repeater ~ offervalue + has_bought_brand + has_bought_company + has_bought_category, training, ntree=50)

> testPrediction <- predict(model, testing)

> str(testPrediction)
 Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "names")= chr [1:64020] "4" "5" "11" "12" ...
poiuytrez
  • 21,330
  • 35
  • 113
  • 172

1 Answers1

5

First of all, when posting code, make sure it's reproducible; ideally we should be able to copy/paste it into our own R sessions to get the same error/problem as you. Post a str() of a data.set does not help. Often you can find simple examples in the help pages of the functions involved. The following example comes from ?randomForest

set.seed(71)
iris.rf <- randomForest(Species ~ ., data=iris, importance=TRUE,
                        proximity=TRUE)

since class(iris.rf)==c("randomForest.formula", "randomForest"), when you call predict(iris.rf), you're actually calling predict.randomForest(). The help page for ?predict.randomForest gives the documentation for all the various parameters including the type= parameter. By default it just returns the predicted class, but you can return the predicted probabilities with type="prob")

predict(iris.rf, type="prob")

which returns

         setosa  versicolor   virginica
1   1.000000000 0.000000000 0.000000000
2   1.000000000 0.000000000 0.000000000
3   1.000000000 0.000000000 0.000000000
4   1.000000000 0.000000000 0.000000000
# etc ....
Community
  • 1
  • 1
MrFlick
  • 195,160
  • 17
  • 277
  • 295