2

I want to get classification probabilities of each class by each tree in the randomForest.

(1) This outputs individual outputs but its type is response, not probabilities:

predict(rf_cl, newdata, predict.all=TRUE)$individual 

(2) This outputs probabilities but it belongs to the forest not all trees:

predict(rf_cl, newdata, type="prob")

(3) When I tried this, I got the same output as the first one.

predict(rf_cl, newdata, predict.all=TRUE, type="prob")$individual 

I am searching for a long time on net. But no use. Please help or try to give some ideas how to achieve this. Thanks in advance.

Whitney
  • 21
  • 2
  • 1
    Which random forest package are you using? Or do you have the same issue with all packages e.g. randomForest, Rborist, ranger? – rw2 Jul 03 '22 at 07:40
  • I am using the **randomForest** package. This is a helpful document. [link](https://cran.r-project.org/web/packages/randomForest/randomForest.pdf). And I have not tried other packages. – Whitney Jul 03 '22 at 07:49
  • can't you just take 1 and calculate the proportions for each tree – rawr Jul 03 '22 at 08:44
  • I'm sorry I can't. I may not have made my point clear. Suppose that there are J classes, n samples and M trees in the forest. The code (1) will outputs a (n, J)-dimensional data frame (or a matrix). Each row represents a sample, and each column is the classification probabilities of each class for this sample predicted by the forest. However, I want extract classification probabilities of each class for this sample predicted by M trees. In other word, a (n, J, M)-dimensional data frame is what I need. – Whitney Jul 03 '22 at 09:12

1 Answers1

0

The member decision trees of a randomForest decision tree ensemble make "pure" predictions. That is, the probability of the winning category is 1.0, and the probabilities of all other categories are 0.0.

The random forest computes the aggregate probability using the voting mechanism - the number of "pure" predictions (aka votes) for each class, divided by the total number of member decision trees. Knowing this will help you choose the number of decision trees in order to achieve the desired "precision" of aggregate probabilities, and avoid any ties. For example, when modeling a binary target, then you should choose an odd number of member decision trees to avoid a 0.5 vs. 0.5 tie.

user1808924
  • 4,563
  • 2
  • 17
  • 20
  • Thank you for your answer sincerely! I have one more question. Drop an observation down a constructed tree in the random forest, it ends up with leave `l`. Can the proportion of each vote in this leave represent the classification probabilities of classes by this tree? If so, how can I extract this? – Whitney Jul 03 '22 at 12:56
  • To the best of my knowledge, the `randomForest` data structure does not contain record count information (of the training set). Therefore, it is impossible to statically discriminate between "strong vote confidence" `1` score, and "medium/weak vote confidence" member predictions. You may re-construct this information yourself, by making the prediction using the training set in `predict(predict.all = TRUE)` mode. – user1808924 Jul 03 '22 at 18:13