4

Consider the very basic (and inefficient) code using parallel foreach for generating random values:

cl <- makeCluster(2)
registerDoParallel(cl)
foreach(i = 1:100) %dopar% rnorm(1)

Is it correct or are there any additional steps needed for random generation to work properly? I guess it's enough and fast checks seem to "prove" that seeds work properly, but I'd like to be sure that it is so on other platforms, since I want the code to be portable.

Tim
  • 7,075
  • 6
  • 29
  • 58
  • 1
    This collapes into two important subtasks: A: making sure parallel calls to some PRNG are working (thread-safety, blocking and co.) where the more safe approach is using one PRNG for each thread/process (not sure what kind of parallelization is done here) and B: (in the case of different PRNGs) making sure that those seeds are able to produce good random-numbers. There are a lot of defects in many PRNGs in regards to this (e.g. Mersenne-Twister initialized with seeds: 0, 1, 2 -> bad). The keyword for further search is: *distributed seeding* (with many approaches: leap-frogging; PRNG-jumps, .). – sascha Apr 09 '17 at 02:12
  • Thanks. But all-purpose state-of-art packages [like `plyr`](https://github.com/hadley/plyr/blob/master/R/llply.r) do not seem to care about it. Does it mean that they should not be used for such purpose? – Tim Apr 09 '17 at 06:54

1 Answers1

10

Your worries are correct; random number generation does not magically work in parallel and further steps need to be taken. When using the foreach framework, you can use the doRNG extension to make sure to get sound random numbers also when done in parallel.

Example:

library("doParallel")
cl <- makeCluster(2)
registerDoParallel(cl)

## Declare that parallel RNG should be used for in a parallel foreach() call.
## %dorng% will still result in parallel processing; it uses %dopar% internally.
library("doRNG")

y <- foreach(i = 1:100) %dorng% rnorm(1)

EDIT 2020-08-04: Previously this answer proposed the alternative:

library("doRNG")
registerDoRNG()
y <- foreach(i = 1:100) %dopar% rnorm(1)

However, the downside for that is that it is more complicated for the developer to use registerDoRNG() in a clean way inside functions. Because of this, I recommend to use %dorng% to specify that parallel RNG should be used.

HenrikB
  • 6,132
  • 31
  • 34
  • Thanks, +1. I heard about `doRNG` but never used it. Could you comment on how does using `registerDoRNG()` differ from using `%dorng%` ..? Are they the same? – Tim Apr 09 '17 at 06:55
  • I'm rusty on the details, but the doRNG vignette should answer your question. – HenrikB Apr 09 '17 at 19:44
  • By the way, does doFuture handle the RNG seeds anyhow? – Tim Apr 10 '17 at 12:37
  • No, doFuture is just a thin layer wrapping future into the foreach framework - it passes RNG needs on to foreach / doRNG. – HenrikB Apr 10 '17 at 15:18