Storing multiple loop values in a data frames

Question

I am running multiple models using two loops: one loop to consider all possible variables combinations (j, in the code below) and another one to run a 10-fold cross validation (i), to evaluate each model combination. And I want to store the evaluation metrics of each fold (i) according to each variable combination (j) in a dataframe.

I will represent the structure of the loop I am using, without detailing the code that goes within.

## create dataframe to store model evaluation metrics
Table <- data.frame(Model=character(), 
                    Fold=character(), 
                    AUC=numeric())

## loop through each variable combiantion
for (j in seq_along(var_combinations)) {
  #loop through each fold
  for(i in 1:10) {
    foldlist_gam[[i]] <- model        ## list where each fold is stored
    models_vgam[[j]]  <- foldlist_gam ## list where each model combination, for each fold, is stored
  }
}

In order to see each model for each variable combination, per fold, I can run models_vgam[[j]][[i]], e.g. models_vgam[[1]][[5]] shows the 5th fold of the 1st model combination.

So, my desired result would be a table that looks something like this:

Model	Fold	AUC
j=1	i=1
j=1	i=2
j=1	i=3
j=1	i=4
...	...
j=10	i=8
j=10	i=9
j=10	i=10

By running

Table[i + 1, 3] <- auc

I am getting the metric for each loop. Then, I tried to create a new data frame, called performance to store each value for each model combination by doing something like this:

performance[j, 3] <- Table[[i, 3]]

But it seems that I only get the values of the last fold for each model combination.

Could you please help me with the code to provide the desired dataframe?

Nice first post, however you forgot to include toy data, see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. BTW, you cant use commas with double brackets, instead of `Table[[i, 3]]` you probably need `Table[i, 3]`. — jay.sf, Mar 31 '23 at 19:33
As a note, if this dataframe is going to have a lot of rows added to it then you should pre-allocate the dataframe rows. It will avoid *the* major cause of slow-down in `for` statements. R does over-allocate a little bit when it creates the object, but it won't be enough if you're growing it like a thousand times. — DuckPyjamas, Mar 31 '23 at 19:37
@jay.sf `getGeneric("[[")` and `[[.data.frame` would both disagree with you about the number of arguments that you can pass to `[[`. — Mikael Jagan, Mar 31 '23 at 22:47
@jay.sf Yes I think so - `stopifnot(identical(data.frame(1:6, 7:12, 13:18)[[2L, 3L]], 14L))` — Mikael Jagan, Apr 01 '23 at 00:12

Storing multiple loop values in a data frames

0 Answers0