2

I have written a function where I execute some code in parallel:

 cl <- makeCluster(nc, outfile = "")
 registerDoParallel(cl, nc)

 pred <- foreach(s = iter(seq(1L,length(dfr_missings))),
                 .packages = c('RANN', 'randomForest','magrittr'),
                 .errorhandling = 'stop',
                 .verbose = F,
                 .combine = 'cbind',
                 .export = c("myRoughfix")) %dopar% {
                 #
                 #some code goes here
                 # 
                 }

stopCluster(cl)
stopImplicitCluster() 

The function works as expected with smaller dataframes. However, I need it to run with a bigger ones.

I get the following error:

 Error in unserialize(socklist[[n]]) : error reading from connection
 terminate called after throwing an instance of 'std::bad_alloc'
   what():  std::bad_alloc

As far as I understand the error message, it indicates that I ran out of memory. The size of the dataframe I am experiencing issues with is ~770MB. I am working on a 256GB RAM machine with 48 cores. I would expect this machine to be able to handle an object that big. The code does not do anything that is memory intensive.

So my question is - is it possible that there are some memory restrictions set to the workers that could be managed with a global option? Possibly an option to the OS or to makeCluster().

Any other thoughts are welcomed.

P.S. I am on a preset virtual machine on 64bit oracle linux 6 with R version - "Oracle Distribution of R version 3.1.1"

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
deann
  • 756
  • 9
  • 24
  • have you monitored the process? Are you working on a 64-bit version of R? – loki Aug 02 '17 at 13:49
  • Seems to be a c++ error. Have you checked [this](https://stackoverflow.com/a/9456758/3250126)? – loki Aug 02 '17 at 13:53
  • I am on a 64-bit version of R. I did see that it is a c++ error, however I am not sure how I can " try to ensure a safe program termination by freeing outstanding resources" in R. That is why I thought there could be an option to allocate memory. – deann Aug 02 '17 at 13:57
  • have you tried the `gc()`? – loki Aug 02 '17 at 13:59
  • what do you mean? – deann Aug 02 '17 at 14:02
  • Have you tried to use the garbage collector `gc()` in the section of your code? It frees memory which is not used. In my code I put a `gc();gc();gc()` after some lines, especially after huge calculations. Since you are importing `randomForest` I would assume a garbage collection can improve the resource usage. – loki Aug 02 '17 at 14:05
  • 3
    Try to monitor the task manager as your code runs and see if you are reaching the memory limit. My machine also has 256gb of RAM, but when running a parallelization with just 10 cores, it can sometimes reach that limit. I believe this is because everything (incl packages and data) is passed to each node. You can also try to run less cores to see if that helps. – BigTimeStats Aug 02 '17 at 14:10
  • What is this function: `myRoughfix`? – nrussell Aug 02 '17 at 15:15
  • @nrussell that is a wrapper around randomForest::na.roughfix(). It just does some data prep in advance. In general, "#some code goes here" part is a sequential code which does custom imputation. – deann Aug 02 '17 at 15:35
  • You probably just need to run this through `gdb` to figure out what function the failed allocation is triggered in. As it stands, your problem isn't reproducible, however. – nrussell Aug 02 '17 at 15:38
  • I am removing the Rcpp tag as there is no demonstrated relationship to the Rcpp package here. – Dirk Eddelbuettel Aug 02 '17 at 18:11
  • Does it run with `registerDoSEQ()` (sequential)? – F. Privé Aug 03 '17 at 11:59
  • @DirkEddelbuettel I'm using the rcpp package and i get the same unexplained error – Parsa Apr 04 '18 at 18:56

0 Answers0