0

I've spent a lot of time trying to understand how train function works and I still don't think I understand how it works. I am training a neuralnet using train function to predict times table.

When I plot the model after training I get the network below:

enter image description here

In the plot, it says Error = 0.01643 and I just realized that whenever I train and plot my finalModel, the Error value in the plot always happens to be the Error value from the last output message that I get after train function returns.

enter image description here

So I was wondering whether the last model was chosen or not because the last model didn't have the lowest RMSE. I am assuming that those output messages are in fact in order, in other words, the first output hidden: 8 thresh: 0.01 rep: 1/1 steps: 10304 error: 0.01592 time: 1.75 secs is from Fold01.Rep1 and the last is from FoldK.RepN, in my case Fold10.Rep3.

I thought I could do this crosscheck re-calculating RMSEs (since this is the deciding metric of the finalModel) for each hold-out and then compare the results with tt.cv$resamples$RMSE. So I would see a row where my re-calculation of RMSE and tt.cv$resample$RMSE would be equal which would mean that, that particular model was chosen as my finalModel.

However, when I've re-calculated all the RMSEs for each hold-out, I didn't see any rows where my calculation and tt.cv$resample$RMSE were equal. Below you can see the comparison:

RMSE column has the actual RMSE values that I got from tt.cv$resamples$RMSE and rmse_ho column has the RMSE values that I've re-calculated using finalModel.

enter image description here

Could you please point out whether I've any mistakes or not and how can I crosscheck the models elected?

While I am creating this post I've just realized that there are 31 models actually if I count the outputs. I'd read about it somewhere on the internet but couldn't find it again now.

Here is the code:

library(caret)
library(neuralnet)

# Create the dataset
tt <- data.frame(multiplier = rep(1:10, times = 10), multiplicand = rep(1:10, each = 10))
tt <- cbind(tt, data.frame(product = tt$multiplier * tt$multiplicand))

# Splitting 
indexes <- createDataPartition(tt$product,
                              times = 1,
                              p = 0.7,
                              list = FALSE)
tt.train <- tt[indexes,]
tt.test <- tt[-indexes,]

# Pre-process

preProc <- preProcess(tt, method = c('center', 'scale'))
tt.preProcessed <- predict(preProc, tt)
tt.preProcessed.train <- tt.preProcessed[indexes,]
tt.preProcessed.test <- tt.preProcessed[-indexes,]

# Train

train.control <- trainControl(method = "repeatedcv",
                              number = 10,
                              repeats = 3,
                              savePredictions = TRUE)

tune.grid <- expand.grid(layer1 = 8,
                         layer2 = 0,
                         layer3 = 0)

# Setting seed for reproducibility & train
set.seed(12)
tt.cv <- train(product ~ .,
               data = tt.preProcessed.train,
               method = 'neuralnet',
               tuneGrid = tune.grid,
               trControl = train.control,
               algorithm = 'backprop',
               learningrate = 0.005,
               stepmax = 100000,
               lifesign = 'minimal',
               threshold = 0.01)

errors = data.frame(rmse_ho=numeric(), resample=character())
#  Re-calculate RMSE values for each hold-out using tt.cv$finalModel
for(i in levels(as.factor(tt.cv$pred$Resample))) {
  dframe = tt.cv$pred[tt.cv$pred$Resample == as.character(i),]

  hold_outs = tt.preProcessed.train[dframe$rowIndex,]
  prediction_hold_outs = predict.train(tt.cv, hold_outs)

  rmse_hold_out = RMSE(prediction_hold_outs, hold_outs$product)

  errors = rbind(errors, data.frame(rmse_ho = rmse_hold_out,
                                    # sse_all = sse_all,
                                    # sse_train = sse_train_data,
                                    resample = i))
}

# View the comparison
View(cbind( tt.cv$resample[order(tt.cv$resample$Resample),], errors))
Burak Kaymakci
  • 662
  • 2
  • 16
  • 36
  • Well, I've just found [this](https://stackoverflow.com/a/38134829/4796762) which proves my question wrong I assume. But I still wonder how the `finalModel` is built using those models though. [This](https://stats.stackexchange.com/questions/52274/how-to-choose-a-predictive-model-after-k-fold-cross-validation) as well. – Burak Kaymakci Dec 02 '19 at 19:08
  • 1
    In your code you are evaluating just one model `expand.grid(layer1 = 8 , layer2 = 0, layer3 = 0)`. You are evaluating it by performing 3 times repeated 10 fold CV - so 30 models will be fit to evaluate the performance using some data splits and finally a model on the whole data is built (31st model) with the best hyper parameters. In your case only one hyper parameter combination was provided so that is the best – missuse Dec 03 '19 at 09:36
  • Also I suggest [reading this book](http://topepo.github.io/caret/index.html) if you plan to use caret. – missuse Dec 03 '19 at 09:38
  • Thank you, I was actually reading it partially haha but I will read it all. :) – Burak Kaymakci Dec 04 '19 at 14:03

0 Answers0