2

I need to do an four-fold nested repeated cross validation to train a model. I wrote the following code, which has the inner cross-validation, but now I'm struggling to create the outer.

fitControl <- trainControl(## 10-fold CV
                           method = "repeatedcv",
                           number = 10,
                           ## repeated five times
                           repeats = 5,
                           savePredictions = TRUE,
                           classProbs = TRUE,
                           summaryFunction = twoClassSummary)

model_SVM_P <- train(Group ~ ., data = training_set, 
                 method = "svmPoly", 
                 trControl = fitControl,
                 verbose = FALSE,
                 tuneLength = 5)

I made an attempt to solve the problem:

ntrain=length(training_set)    
train.ext=createFolds(training_set,k=4,returnTrain=TRUE)
test.ext=lapply(train.ext,function(x) (1:ntrain)[-x])

for (i in 1:4){
    model_SVM_P <- train(Group ~ ., data = training_set[train.ext[[i]]], 
                 method = "svmRadial", 
                 trControl = fitControl,
                 verbose = FALSE,
                 tuneLength = 5) 

    }

But it didn't worked. How can I do this outer loop?

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Lucas Lazari
  • 119
  • 9
  • 1
    This answer might prove helpful https://stackoverflow.com/questions/62183291/statistical-test-with-test-data/62193116#62193116. The problem in your code is you create `model_SVM_P` four times for each `i` instead of saving results of the four iterations in a list for example. – missuse Aug 17 '20 at 21:18
  • I tried to add the result in a list, but It doesn't work. How can I create a list of objects? model_SVM_P class is "train" and "train.function", I couldn't find a way to create a list of these elements – Lucas Lazari Aug 18 '20 at 15:53
  • 1
    You can just use the `lapply` approach in the linked post. To learn how to use `for` loops in R read this: https://r4ds.had.co.nz/iteration.html. – missuse Aug 18 '20 at 16:02

1 Answers1

1

The rsample package has implemented the outer loop in the nested_cv() function, see documentation.

To evaluate the models trained by nested_cv, have a look at this vignette which shows where the "heavylifting" is done:

# `object` is an `rsplit` object in `results$inner_resamples` 
summarize_tune_results <- function(object) {
  # Return row-bound tibble that has the 25 bootstrap results
  map_df(object$splits, tune_over_cost) %>%
    # For each value of the tuning parameter, compute the 
    # average RMSE which is the inner bootstrap estimate. 
    group_by(cost) %>%
    summarize(mean_RMSE = mean(RMSE, na.rm = TRUE),
              n = length(RMSE),
              .groups = "drop")
}

tuning_results <- map(results$inner_resamples, summarize_tune_results)

This code applies the tune_over_cost function on every hyperparameter and split (or fold) of the training data which is here called "assessment data".

Please check out the vignette for more useful code including parallelization.

Agile Bean
  • 6,437
  • 1
  • 45
  • 53