The issue is that in the utils.R file within the pbmcapply package it runs the following line:
if (isTRUE(mc.set.seed))
mc.set.stream()
If we compare this to what is being called when we run the mclapply()
function in the parallel package we see that it runs:
if (mc.set.seed)
mc.reset.stream()
This affects the results as reset stream will allow the code to be run from the globally set seed, whereas running set stream sets it to the a new random starting value using the initial seed. We can see this in the functions attached below:
mc.reset.stream <- function ()
{
if (RNGkind()[1L] == "L'Ecuyer-CMRG") {
if (!exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE))
sample.int(1L)
# HERE! sets the seed to the global seed variable we set
assign("LEcuyer.seed", get(".Random.seed", envir = .GlobalEnv,
inherits = FALSE), envir = RNGenv)
}
}
mc.set.stream <- function ()
{
if (RNGkind()[1L] == "L'Ecuyer-CMRG") {
assign(".Random.seed", get("LEcuyer.seed", envir = RNGenv),
envir = .GlobalEnv)
}
else {
if (exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE))
rm(".Random.seed", envir = .GlobalEnv, inherits = FALSE)
}
}
I believe this change may be due to an issue with mclapply when you want to call the mclapply function more than once after setting the seed it will use the same random numbers. (i.e. by resetting the r session you should get the same results in the same order with pbmclapply so first time I get 0.143 then 0.064 and then -0.015). This is usually the preferred behaviour so you can call the funciton multiple times. See R doesn't reset the seed when "L'Ecuyer-CMRG" RNG is used? for more information.
The differences between these two implementations can be tested with the following code if you change the line in the .customized_mcparallel funciton definition from mc.set.stream() to mc.reset.stream(). Here I have simplified the function calls in the package to strip out the progress bar and leave in only the calculation (removing error checks also) and the change in setting the random seed. (Additionally note these functions will no longer run on a Windows machine only Linux or MacOS).
library(pbmcapply)
RNGkind("L'Ecuyer-CMRG")
set.seed(1)
pbmclapply <- function() {
pkg <- asNamespace('pbmcapply')
.cleanup <- get('.cleanup', pkg)
progressMonitor <- .customized_mcparallel({
mclapply(1:100, function(i) {
rnorm(1)
}, mc.cores = 2, mc.preschedule = TRUE, mc.set.seed = TRUE,
mc.cleanup = TRUE, mc.allow.recursive = TRUE)
})
# clean up processes on exit
on.exit(.cleanup(progressMonitor$pid), add = T)
# Retrieve the result
results <- suppressWarnings(mccollect(progressMonitor$pid)[[as.character(progressMonitor$pid)]])
return(results)
}
.customized_mcparallel <- function (expr, name, detached = FALSE){
# loading hidden functions
pkg <- asNamespace('parallel')
mcfork <- get('mcfork', pkg)
mc.advance.stream <- get('mc.advance.stream', pkg)
mcexit <- get('mcexit', pkg)
mcinteractive <- get('mcinteractive', pkg)
sendMaster <- get('sendMaster', pkg)
mc.set.stream <- get('mc.set.stream', pkg)
mc.reset.stream <- get('mc.reset.stream', pkg)
f <- mcfork(F)
env <- parent.frame()
mc.advance.stream()
if (inherits(f, "masterProcess")) {
mc.set.stream()
# reset the group process id of the forked process
mcinteractive(FALSE)
sendMaster(try(eval(expr, env), silent = TRUE))
mcexit(0L)
}
f
}
x <- pbmclapply()
y <- do.call(rbind, x)
z <- mean(y)
print(z)
For a complete remedy my best suggestion would be to either reimplement the functions in your own code (I copy pasted with some minor modifications to the functions from pbmcapply) or by forking the package and replacing the mc.set.seed in the utils.R file with mc.reset.seed. I can't think of a simpler solution at the moment, but hopefully this clarifies the issue.