2

I'm trying to modify large 3D datasets in R, in parallel. Like a few others, I've bumped into the issue of R making copies of variables it's modifying, instead of modifying them 'in place'.

I've seen Hadley's page on loops and modifying in place (http://adv-r.had.co.nz/memory.html#modification), and am using mcmapply (the parallel version of mapply) to modify a list. But my memory usage still explodes. I haven't found much else that explicitly documents this issue and how to get around it. According to Hadley's page, if one is modifying a list modification in place should be occurring, but this clearly doesn't happen for me. These aren't global variables and aren't being referenced elsewhere.

I'm dealing with 3 variables of ~1GB each but I surpass 20GB of RAM used due to the operations I'm performing. Other languages I've used wouldn't have a problem with this (and I'm obliged to stick with R in this case).

Has anyone found a memory efficient way to modify a multi-dimensional dataset in parallel? Specifically where the variable is modified in place?

As a simplified example of what I'm coding:

var1 to var4 are read in from files ~800 MB each, var5 is only an array of two numbers.

for (long in 1:length(lon)) {
  outdata[[long]] <- mcmapply(function,arg1<-var1[long,],arg2<-var2[long,],arg3<-var3[long,],arg4<-var4[long,],MoreArgs<-list(arg5<-var5));
  gc(verbose=TRUE)
}

With each iteration the memory reported by gc grows by ~50 MB, thus very soon I'm using GB's of memory. The list "outdata" is defined beforehand too.

Any help would be appreciated.

Nick
  • 21
  • 3
  • have you used data.table package – Ajay Ohri Jul 13 '15 at 06:46
  • 1
    Please provide a representative [minimal reproducible example](http://stackoverflow.com/a/5963610/1412059) that clearly shows why you need parallelization (I don't understand what you are doing with `mcmapply` there). Package data.table might be the answer if you don't need parallelization. Maybe you need to look into package dplyr. Or use Rcpp. – Roland Jul 13 '15 at 06:58

0 Answers0