how to calculate the confidence level for random forest regression model in R

Question

I'm using randomForest package in R, for the purpose of predicting the distances between proteins (regression model in RF) "for a homology modeling purposes" and I obtained quite good results. However, I need to have a confidence level to rank my predicted values and filter out the bad models, so I wonder if there is any possibility to calculate such confidence level, or any other way of measuring the certainty of the predictions? any suggestions or recommendations is highly appreciated

One simple approach would be to simply treat the predictions from each tree in the forest as a sample of predictions, from which you can calculate a mean and standard error, just as if you were calculating a CI for a mean. — joran, Jul 23 '13 at 14:25

StupidWolf · Answer 1 · 2020-10-02T16:28:05.390

Following the jackknife method highlighted in this paper to obtain the standard error, you can use an implementation in the package ranger :

library(ranger)
library(mlbench)
data(BostonHousing)

mdl = ranger(medv ~ .,data=BostonHousing[1:400,],keep.inbag = TRUE)

pred = predict(mdl,BostonHousing[401:nrow(BostonHousing),],type="se")

 head(cbind(pred$predictions,pred$se ))
          [,1]     [,2]
[1,] 10.673356 1.107839
[2,] 11.390374 1.102217
[3,] 12.760511 1.126945
[4,] 10.458128 1.100246
[5,] 10.720076 1.084376
[6,]  9.914648 1.102000

The confidence interval can be estimated as 1.96*se. There is also a new package forestError available that can work on randomForest objects:

library(randomForest)
library(forestError)
mdl = randomForest(medv ~ .,data=BostonHousing[1:400,],keep.inbag=TRUE)

err = quantForestError(mdl,BostonHousing[1:400,],BostonHousing[401:nrow(BostonHousing),])

head(err$estimates)
       pred     mspe       bias lower_0.05 upper_0.05
1 10.649734 15.70943 -1.5336411   2.935949   12.59486
2 11.611078 15.16339 -1.4436056   3.897293   13.55621
3 12.603938 20.92701 -0.9590869   4.890153   22.32699
4 10.650549 12.42555 -1.4188440   3.941648   12.49029
5 10.414707 29.08155 -1.1438267   2.700922   31.42272
6  9.720305 19.63286 -1.3469671   2.006520   16.43220

You can refer to this paper for the actual method used,

how to calculate the confidence level for random forest regression model in R

1 Answers1