R - vector memory exhausted (limit reached?) Memory issues with nested loops?

Question

I'm currently trying to write an R script to import a variety of files I've created related to a dataset. This involves reading a lot of .txt files using several nested for loops based on how I've organized the directories and names of the files.

I can run the inner most loop fine (albiet a little slow). However, trying to run the second loop or any further loops creates the following error:

Error: vector memory exhausted (limit reached?)

I believe this may be related to how R handles memory? I'm running R out of Rstuidio. I've also tried the solution posted here with no luck

'R
 R version 3.5.1 (2018-07-02) -- "Feather Spray"
 Platform: x86_64-apple-darwin15.6.0 (64-bit)

Code Below

subjects <- 72
loop1_names <- as.character(list('a','b','c'))
loop2_names <- as.character(list('one','two','three'))
loop3_names <- as.character(list('N1','N2'))
loop4_names<- as.character(list('choice1','choice2','choice3'))
i<-1;j<-1;

loop3.subset<- data.frame
for(k in 1:length(loop3_names)){

  loop4.subset<- data.frame()#Data frame for handling each set of loop 4 values
  for(l in 1:length(loop4_names)){

            #Code for extracting the variables for each measure

            measures.path <- file.path(results_fldr, 'amp_measures',loop1_names[i],loop2_names[j],'mont',loop3_names[k])
            measures.data <- read.table(file.path(measures.path, paste(paste(loop1_names[i],loop2_names[j],loop3_names[k],loop4_names[l],sep = '_'),'.txt',sep = '')), header = T, nrows = subjects)

            #Get rid of the IDs, we'll add those back in later
            col_idx_ID <- grep('ID', names(measures.data))
            measures.data <- as.data.frame(measures.data[,-col_idx_ID])# make sure when trimming to keep the measures as a data frame
            names(measures.data) <- c(paste(loop1_names[i],loop2_names[j],loop3_names[k],loop4_names[l],sep = '_'))#Add a label to the data

            #Now combine this data with the other data in the loop4 subset data frame
            if(l == 1){
              loop4.subset <- measures.data
            } else {
              loop4.subset <- merge(erp.subset,measures.data)
            }

          }#End l/loop 4
          if(k == 1){
            loop3.subset <- loop4.subset
          } else {
            freq.subset <- merge(loop3.subset,loop4.subset)
          }

        }#End k/loop 3

score 1 · Answer 1 · answered Aug 23 '19 at 11:56

Generally I would suggest you read in only part of the data to memory, then write the partially merge to disk. In the example below which of course I can't run because I don't have your files. I write to disk after each i, j loop and then after that is done have 9 files. Now you merge those 6 files in another loop. If you still have memory problems break this up into another 2 files by first doing the "j" merge and writing each to 3 "i" files. Then if you can't merge those files you have a fundamental problem with lack of memory on your machine.

subjects <- 72
loop1_names <- as.character(list('a','b','c'))
loop2_names <- as.character(list('one','two','three'))
loop3_names <- as.character(list('N1','N2'))
loop4_names<- as.character(list('choice1','choice2','choice3'))

for(i in 1:length(loop1_names)) {
    for(j in 1:length(loop2_names)) {
        loop3.subset<- data.frame
        for(k in 1:length(loop3_names)){

            loop4.subset<- data.frame()
            for(l in 1:length(loop4_names)){

                ##Code for extracting the variables for each measure

                measures.path <- file.path(results_fldr,
                                           'amp_measures',
                                           loop1_names[i],
                                           loop2_names[j],
                                           'mont',
                                           loop3_names[k])
                measures.data <- read.table(file.path(measures.path, paste(paste(loop1_names[i],
                                                                                 loop2_names[j],
                                                                                 loop3_names[k],
                                                                                 loop4_names[l],
                                                                                 sep = '_'),'.txt',sep = '')),
                                            header = T, nrows = subjects)

                ##Get rid of the IDs, we'll add those back in later
                col_idx_ID <- grep('ID', names(measures.data))
                measures.data <- as.data.frame(measures.data[,-col_idx_ID])
                names(measures.data) <- c(paste(loop1_names[i],
                                                loop2_names[j],
                                                loop3_names[k],
                                                loop4_names[l],
                                                sep = '_'))

                ## Now combine this data with the other data in the loop4 subset data frame
                if(l == 1){
                    loop4.subset <- measures.data
                } else {
                    loop4.subset <- merge(erp.subset,measures.data)
                }

            }#End l/loop 4
            if(k == 1){
                loop3.subset <- loop4.subset
            } else {
                freq.subset <- merge(loop3.subset,loop4.subset)
            }
        }#End k/loop 3
        write.table(freq.subset, paste0(i, "_", j, ".txt"))
    }
}

## Now you have 6 files to read in a merge.
## Something like this:

df <- NULL
for(i in 1:length(loop1_names)) {
    for(j in 1:length(loop2_names)) {
        df1 <- read.table(paste0(i, "_", j, ".txt"))
        df <- merge(df, df1)
    }
}

Thanks for the suggestion. I was actually able to find a work around by switching from merge to cbind.data.frame. Even trying the two inner loops and then writing to the table you suggested ended up causing the same memory problem. I think I'll try and avoid merge from now on unless absolutely necessary. Also, your code is much better organized, and gave me some great ideas on how to make mine more readable. Thank you! — cliffson_a, Aug 23 '19 at 17:53

R - vector memory exhausted (limit reached?) Memory issues with nested loops?

1 Answers1