I have an outer foreach/dopar parallel loop containing an inner loop. Every instance of the inner loop should work on the same set of random numbers. The rest, i.e. the remaining parts of the outer body and the parallel instances should work as usual, i.e. with independent random numbers.
I can achieve this in a non-parallel implementation by saving the state of the RNG before the start of the inner loop and restoring this state after execution of each instance of the inner loop. See the following example:
library(doSNOW)
seed = 4711
cl = makeCluster(2)
registerDoSNOW(cl)
clusterSetupRNGstream (cl, seed=rep(seed,6))
erg = foreach(irun = 1:3,.combine = rbind) %dopar% {
#do some random stuff in outer loop
smp = runif(1)
# save current state of RNG
s = .Random.seed
# inner loop, does some more random stuff
idx = numeric(5)
for(ii in seq.int(5)) {
idx[ii] = sample.int(10, 1)
# reset RNG for next loop iteration
set.seed(s)
}
c(smp,idx)
}
> print(erg)
[,1] [,2] [,3] [,4] [,5] [,6]
result.1 0.5749162 7 6 2 3 7
result.2 0.1208910 4 3 6 8 9
result.3 0.3491315 7 2 7 6 10
My desired output were constant integers along each row, different from row to row. So this does not work in parallel. The reason is quite clear: snow uses a different random generator (rlecuyer) and has to deal with parallel streams.
The question is: How can I achieve this reset of seed(s) in snow with the random number generator rlecuyer? One of the the challenges for me is to identify the proper substream one is in when the seed should be reset.
My current work-around is to pre-calculate all random stuff (in the example the idx vector) once for the inner loop and then use this constant data in all inner instances. This is not optimal since the random data in total becomes very large and it is much better to (re)generate it on the fly in smaller chunks.
Edit
It is of utmost importance that any setting/resetting keeps the parallel streams independent. As I currently understand the mechanism this requires:
For and within each parallel process:
- Detect the relevant substream the process is working on.
- At convenient times: Find and store the random seed of THIS substream.
- At convenient times: Reset substream from 1) with seed from 2) and leave all other streams undisturbed.