2

I'm trying to create a boxplot with the distribution of RMSE over all predicted resamples. The mean of the resamples equals the models predicted RMSE and therefore it would be interesting to exhibit how this number is calculated. How can I obtain predicted RMSE if I had run each of the models resamples? For example with 5-fold CV:

  • Model RMSE: 5

Fold 1, 2, 3, 4 ,5 = 5.02 , 5.01, 5, 4.99, 4.98

# Load packages
library(mlbench)
library(caret)

# Load data
data(BostonHousing)

#Dividing the data into train and test set
set.seed(1)
sample <- createDataPartition(BostonHousing$medv, p=0.75, list = FALSE)
train <- BostonHousing[sample,]
test <- BostonHousing[-sample,]

control <- trainControl(method='repeatedcv', number=10, repeats=3, savePredictions=TRUE)
metric <- 'RMSE'

# some random model
set.seed(1)
example <- train(medv~., data=train, method='example', metric=metric,
                 preProc=c('center', 'scale'), trControl=control)

I know one can obtain for this for resampled on train; example$resample

Is there some similar default way to this for predicted with each resample?

Appreciate all help, thanks.

Werner Hertzog
  • 2,002
  • 3
  • 24
  • 36
KingAMPL
  • 21
  • 3
  • you get 5 different rmse because you divide your training data into 5 folds.. For test data I guess it's only possible if you split the test into 5? And you have to use the same final model i guess – StupidWolf Nov 16 '20 at 23:56
  • If I understand the question correctly this might be of relevance: https://stackoverflow.com/questions/62183291/statistical-test-with-test-data/62193116#62193116. Another option is this: https://stackoverflow.com/questions/56950684/how-to-get-predictions-for-each-fold-in-10-fold-cross-validation-of-the-best-tun/56965881#56965881 – missuse Nov 17 '20 at 20:50

0 Answers0