I am running simulations where some computing should be parallelized and some should not.
I am trying to figure out how to ensure reproducibility across purrr::map()
and furrr::future_map()
so that they yield the same result.
For some reason, I cannot use set.seed()
inside the mapped function.
For instance, consider the following code:
library(purrr)
library(furrr)
#> Loading required package: future
set.seed(42)
rnorm(1)
#> [1] 1.370958
set.seed(42)
map(1, ~rnorm(1))
#> [[1]]
#> [1] 1.370958
set.seed(42)
future_map(1, ~rnorm(1), .options=furrr_options(seed=TRUE))
#> [[1]]
#> [1] -0.1691382
set.seed(42)
future_map(1, ~rnorm(1), .options=furrr_options(seed=42))
#> [[1]]
#> [1] -0.02648871
future_map(1, ~rnorm(1), .options=furrr_options(seed=list(42L)))
#> Error in `validate_seed_list()`:
#> ! All pre-generated random seed elements of a list `seed` must be valid `.Random.seed` seeds, which means they should be all integers and consists of two or more elements, not just one.
Created on 2023-02-21 with reprex v2.0.2
As you can see, I could not get the 1.37
value using furrr
. Every call is reproducible but they yield different results.
In my real code, each function will run 100-200 times, which is less than length(.Random.seed)
(==626).
I thus thought setting the seed as a list could be a solution, but I don't really understand the documentation or the error message.
For reference, here is the help file that addresses random seed management: link
Is there a way to have purrr::map()
and furrr::future_map()
yield the same result?
EDIT: for reference, here is the related GitHub issue.