Please consider this minimal reproducible example of a random forest regression estimate
library(randomForest)
# fix missing data
airquality <- na.roughfix(airquality)
set.seed(123)
#fit the random forest model
rf_fit <- randomForest(formula = Ozone ~ ., data = airquality)
#define new observation
new <- data.frame(Solar.R=250, Wind=8, Temp=70, Month=5, Day=5)
set.seed(123)
#use predict all on new observation
rf_predict<-predict(rf_fit, newdata=new, predict.all = TRUE)
rf_predict$aggregate
library(tidyverse)
predict_mean <- rf_predict$individual %>%
as_tibble() %>%
rowwise() %>%
transmute(avg = mean(V1:V500))
predict_mean
I was expecting to get the same value by rf_predict$aggregate
and predict_mean
Where and why am I wrong about this assumption?
My final objective is to get a confidence interval of the predicted value.