I have written a function where I execute some code in parallel:
cl <- makeCluster(nc, outfile = "")
registerDoParallel(cl, nc)
pred <- foreach(s = iter(seq(1L,length(dfr_missings))),
.packages = c('RANN', 'randomForest','magrittr'),
.errorhandling = 'stop',
.verbose = F,
.combine = 'cbind',
.export = c("myRoughfix")) %dopar% {
#
#some code goes here
#
}
stopCluster(cl)
stopImplicitCluster()
The function works as expected with smaller dataframes. However, I need it to run with a bigger ones.
I get the following error:
Error in unserialize(socklist[[n]]) : error reading from connection
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
As far as I understand the error message, it indicates that I ran out of memory. The size of the dataframe I am experiencing issues with is ~770MB. I am working on a 256GB RAM machine with 48 cores. I would expect this machine to be able to handle an object that big. The code does not do anything that is memory intensive.
So my question is - is it possible that there are some memory restrictions set to the workers that could be managed with a global option? Possibly an option to the OS or to makeCluster().
Any other thoughts are welcomed.
P.S. I am on a preset virtual machine on 64bit oracle linux 6 with R version - "Oracle Distribution of R version 3.1.1"