6

I am running into a problem with the foreach section of a program I am working with in R. The program is used to run simulations for varying parameters, and then return the results to a single list which is then used to generate a report. The problem occurs when not all simulation runs assigned are actually visible on the report. In all ways, it appears as though only a subset of the assigned runs were actually assigned.

This is more likely to take place with larger data sets (longer time periods for a simulation, for example). It is less likely to occur with a fresh run of the program, and more likely to occur if something is already taking up RAM. The memory use graph for system monitor sometimes peaks at 100% RAM and 100% swap, and then dips sharply, after which time one of the four child R sessions has disappeared. When using .verbose in foreach(), the log file shows that the simulation runs that do not get shown in the report are returned as NULL, while those which do get shown in the report are returned as normal (a list of data frames and character variables). The same set of parameters can produce this effect or can produce a complete graph; that is, the set of parameters is not diagnostic.

foreach() is used for approximately a dozen parameters. .combine is cbind, .inorder is false, all other internal parameters such as .errorhandling are default.

This is of course quite irritating, since the simulations can take upwards of twenty minutes to run only to turn out to be useless due to missing data. Is there a way to either ensure that these "dropped" sessions are not dropped, or that if they are then this is in some way caught?

(If it's important, the computer being used has eight processors and hence runs four child processes, and the parallel operator registered is from the DoMC package)

The code is structured roughly as follows:

test.results <- foreach(parameter.one = parameter.one.space, .combine=cbind) %:%
foreach(parameter.two = parameter.two.space, .combine=cbind) %:%
...
foreach(parameter.last = parameter.last.space, .combine=cbind, .inorder=FALSE) %dopar%
{

run.result <- simulationRun(parameter.one,
            parameter.two,
            ...
            parameter.last)

 list(list(parameters=list(parameter.one,
            parameter.two,
            ...
            parameter.last),
  runResult <- run.result))
}

return(test.results)
jflint
  • 61
  • 3
  • See also http://stackoverflow.com/questions/7996607/foreach-garbage-collection – Matteo De Felice Apr 17 '14 at 08:50
  • 1
    I think I'm having the same trouble: some of my `foreach(i = 1:ncor) %dopar%` iterations are arbitrarily abandoned few time after the begining of the simulation, I'm trying to figure out when exactly. Maybe because of ram usage but I don't think so because simulations appear to use only half of the computer's ram. Could there be a ram treshold (set in R) different from the computer ram capacity ? – Yollanda Beetroot Jan 05 '15 at 10:53

1 Answers1

2

I'm guessing that you're running on Linux, because from your description, it sounds like the child R session is being killed by the Linux "out-of-memory killer". Coincidentally, I recently worked on the same basic problem where mclapply was used directly.

The doMC package uses the mclapply function to execute the foreach loop in parallel, and unfortunately, mclapply doesn't signal an error when a worker process unexpectedly dies. Instead, mclapply returns a NULL for all tasks allocated to that worker. I don't think there is any option to change this behavior in mclapply.

The only work-arounds that I can think of are:

  1. Use a foreach backend such as doParallel or doSNOW rather than doMC.
  2. Treat NULL's in the result list as an error and rerun with fewer workers.

If you use doParallel, make sure that you create and register a cluster object, otherwise mclapply will be used on Linux systems. With doParallel and doSNOW, if a worker dies abnormally, the master will get an error getting the task result from the dead worker:

Error in unserialize(node$con) : error reading from connection

In this case, the parallel backend will catch the error and use the specified error handling.

Keep in mind that using doParallel or doSNOW may use more memory than doMC, and so you may have to specify fewer workers with them in order to avoid running out of memory.

Community
  • 1
  • 1
Steve Weston
  • 19,197
  • 4
  • 59
  • 75
  • I have been facing wired behaviours when going from `doMC` to `doSNOW` especially with an `Rcpp` function of a package of my own not handled properly. Any idea ? – ClementWalter Jul 01 '16 at 20:00
  • @clemlaflemme Many things work in doMC that don't work in doSNOW because of all the useful things that `fork` does for you. One issue is that Rcpp defined functions can't be serialized which isn't an issue for doMC, but is for doSNOW. You often need to prevent the Rcpp defined function from being serialized and sent to the workers. – Steve Weston Jul 05 '16 at 22:14