I am running a monte-carlo simulation in parallel using joblib
. I noticed however although my seeds were fixed my results kept changing. However, when I ran the process in series it remained constant as I expect.
Below I implement a small example, simulating the mean for a normal distribution with higher variance.
Load Libraries and define function
import numpy as np
from joblib import Parallel, delayed
def _estimate_mean():
np.random.seed(0)
x = np.random.normal(0, 2, size=100)
return np.mean(x)
The first example I implement in series - the results are all the same as expected.
tst = [_estimate_mean() for i in range(8)]
In [28]: tst
Out[28]:
[0.11961603106897,
0.11961603106897,
0.11961603106897,
0.11961603106897,
0.11961603106897,
0.11961603106897,
0.11961603106897,
0.11961603106897]
The second example I implement in Parallel: (Note sometimes the means are all the same other times not)
tst = Parallel(n_jobs=-1, backend="threading")(delayed(_estimate_mean)() for i in range(8))
In [26]: tst
Out[26]:
[0.11961603106897,
0.11961603106897,
0.11961603106897,
0.11961603106897,
0.11961603106897,
0.1640259414956747,
-0.11846452111932627,
-0.3935934130918206]
I expect the parallel run to be the same as the seed is fixed. I found if I implement RandomState
to fix the seeds it seems to resolve the problem:
def _estimate_mean():
local_state = np.random.RandomState(0)
x = local_state.normal(0, 2, size=100)
return np.mean(x)
tst = Parallel(n_jobs=-1, backend="threading")(delayed(_estimate_mean)() for i in range(8))
In [28]: tst
Out[28]:
[0.11961603106897,
0.11961603106897,
0.11961603106897,
0.11961603106897,
0.11961603106897,
0.11961603106897,
0.11961603106897,
0.11961603106897]
What is the difference between using
RandomState
and justseed
when fixing the seeds usingnumpy.random
and why would the latter not reliably work when running in parallel ?
System Information
OS: Windows 10
Python: 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)]
Numpy: 1.17.2