0

I am running multiple models using two loops: one loop to consider all possible variables combinations (j, in the code below) and another one to run a 10-fold cross validation (i), to evaluate each model combination. And I want to store the evaluation metrics of each fold (i) according to each variable combination (j) in a dataframe.

I will represent the structure of the loop I am using, without detailing the code that goes within.

## create dataframe to store model evaluation metrics
Table <- data.frame(Model=character(), 
                    Fold=character(), 
                    AUC=numeric())

## loop through each variable combiantion
for (j in seq_along(var_combinations)) {
  #loop through each fold
  for(i in 1:10) {
    foldlist_gam[[i]] <- model        ## list where each fold is stored
    models_vgam[[j]]  <- foldlist_gam ## list where each model combination, for each fold, is stored
  }
} 

In order to see each model for each variable combination, per fold, I can run models_vgam[[j]][[i]], e.g. models_vgam[[1]][[5]] shows the 5th fold of the 1st model combination.

So, my desired result would be a table that looks something like this:

Model Fold AUC
j=1 i=1
j=1 i=2
j=1 i=3
j=1 i=4
... ...
j=10 i=8
j=10 i=9
j=10 i=10

By running

Table[i + 1, 3] <- auc 

I am getting the metric for each loop. Then, I tried to create a new data frame, called performance to store each value for each model combination by doing something like this:

performance[j, 3] <- Table[[i, 3]]

But it seems that I only get the values of the last fold for each model combination.

Could you please help me with the code to provide the desired dataframe?

jay.sf
  • 60,139
  • 8
  • 53
  • 110
vinha
  • 1
  • 1
  • 2
    Nice first post, however you forgot to include toy data, see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. BTW, you cant use commas with double brackets, instead of `Table[[i, 3]]` you probably need `Table[i, 3]`. – jay.sf Mar 31 '23 at 19:33
  • As a note, if this dataframe is going to have a lot of rows added to it then you should pre-allocate the dataframe rows. It will avoid *the* major cause of slow-down in `for` statements. R does over-allocate a little bit when it creates the object, but it won't be enough if you're growing it like a thousand times. – DuckPyjamas Mar 31 '23 at 19:37
  • @jay.sf `getGeneric("[[")` and `[[.data.frame` would both disagree with you about the number of arguments that you can pass to `[[`. – Mikael Jagan Mar 31 '23 at 22:47
  • @MikaelJagan I see. Have an example for say `x[[2, 3]]`? – jay.sf Mar 31 '23 at 22:51
  • @jay.sf Yes I think so - `stopifnot(identical(data.frame(1:6, 7:12, 13:18)[[2L, 3L]], 14L))` – Mikael Jagan Apr 01 '23 at 00:12

0 Answers0