0

I am trying to use snow to perform some multiprocessing using this code:

cl <- makeCluster(32)
registerDoSNOW(cl)

result <-
    foreach(
      i = 1:iterations,
    ) %dopar%
    {
      current_value <- pull(nearby_genes[i,1])
      
      # DO ANALYSIS FOR CURRENT VALUE

}

This results in the following error: mpi.send(x = serialize(obj, NULL), type = 4, dest = dest, tag = tag, : long vectors not supported yet: memory.c:3782. I did some research and found out that it has something to do with memory issues. One of the dataframes that I am using inside the foreach loop is >5million rows long and has 14 columns so I think it occupies more than 2GB of RAM which triggers the error.

Am I right in this assumption and does anyone have an idea how to circumvent this problem?

Any help is much appreciated!

nhaus
  • 786
  • 3
  • 13
  • If it's a memory Problem, see here: https://stackoverflow.com/questions/1395229/increasing-or-decreasing-the-memory-available-to-r-processes and here https://stackoverflow.com/questions/23950132/how-to-set-memory-limit-in-rstudio-desktop-version#54806324 – pbraeutigm Aug 13 '21 at 08:54
  • I don't think any of these relate to my problem. I think that `snow` doesn't accept if objects are bigger than 2GB to be used inside the `foreach` loop. – nhaus Aug 13 '21 at 09:03
  • Seems strange that something made for parallel processing doesn't accept large datasets. You could look into multidplyr, I've used that with fairly large data (certainly over 2 GB per cluster) without any problems. – heds1 Aug 13 '21 at 09:13
  • Can `multidplyr` also be used to parallize for loops? I am not using any dplyr operations inside my for loop. – nhaus Aug 13 '21 at 09:26

0 Answers0