I'm trying to modify large 3D datasets in R, in parallel. Like a few others, I've bumped into the issue of R making copies of variables it's modifying, instead of modifying them 'in place'.
I've seen Hadley's page on loops and modifying in place (http://adv-r.had.co.nz/memory.html#modification), and am using mcmapply (the parallel version of mapply) to modify a list. But my memory usage still explodes. I haven't found much else that explicitly documents this issue and how to get around it. According to Hadley's page, if one is modifying a list modification in place should be occurring, but this clearly doesn't happen for me. These aren't global variables and aren't being referenced elsewhere.
I'm dealing with 3 variables of ~1GB each but I surpass 20GB of RAM used due to the operations I'm performing. Other languages I've used wouldn't have a problem with this (and I'm obliged to stick with R in this case).
Has anyone found a memory efficient way to modify a multi-dimensional dataset in parallel? Specifically where the variable is modified in place?
As a simplified example of what I'm coding:
var1 to var4 are read in from files ~800 MB each, var5 is only an array of two numbers.
for (long in 1:length(lon)) {
outdata[[long]] <- mcmapply(function,arg1<-var1[long,],arg2<-var2[long,],arg3<-var3[long,],arg4<-var4[long,],MoreArgs<-list(arg5<-var5));
gc(verbose=TRUE)
}
With each iteration the memory reported by gc grows by ~50 MB, thus very soon I'm using GB's of memory. The list "outdata" is defined beforehand too.
Any help would be appreciated.