multithreading Pool resulting in non-random numbers

Question

I previously asked Repeatedly run a function in parallel on how to run a function in parallel. The function that I am wanting to run has a stochastic element, where random integers are drawn.

When I use the code in that answer it returns repeated numbers within one process (and also between runs if I add an outer loop to repeat the process). For example,

import numpy as np
from multiprocessing.pool import Pool

def f(_):
    x = np.random.uniform()
    return x*x    

if __name__ == "__main__":
    processes = 3
    p = Pool(processes)
    print(p.map(f, range(6)))

returns

[0.8484870744666029, 0.8484870744666029, 0.04019012715175054, 0.04019012715175054, 0.7741414835156634, 0.7741414835156634]

Another run may give

[0.17390735240615365, 0.17390735240615365, 0.5188673758527017, 1.308159884267618e-08, 0.09140498447418667, 0.021537291489524404]

It seems as if there is some internal seed that is being used -- how can I generate random numbers similar to what would be returned from np.random.uniform(size=6) please?

@ranifisch; Yes, I would like it to be reproducible, but the main issue just now is not to get the same random numbers drawn across the processes. — user2957945, Nov 25 '19 at 21:22
im sorry if im not understanding the question wrong, but you can do np.random.seed(0) to have same seed across all processes, or im understanding it wrong? — nonamer92, Nov 25 '19 at 21:25
@ranifisch ; from the example; I would expect six different numbers, similar to the results from `np.random.uniform(size=6)` (note I just mean 6 different numbers not that they have to equal the results from `np.ranomd.unifirm` for a given seed). But I am getting the same numbers repeated (a lot of the time). The question gives a couple of instances. — user2957945, Nov 25 '19 at 21:26
oh I think this may be relevant https://stackoverflow.com/questions/12915177/same-output-in-different-workers-in-multiprocessing — user2957945, Nov 25 '19 at 21:29
:) ok I still can't understand the exact problem.. but seems like it's something related to seed — nonamer92, Nov 25 '19 at 21:31

score 0 · Accepted Answer · answered Nov 25 '19 at 21:50

Same output in different workers in multiprocessing indicates that the seed needs to be included in the function. Python multiprocessing pool.map for multiple arguments provides a way to pass multiple arguments to Pool -- one for the repeats and one for a list of seeds. This allows for a new seed for each process, and is reproducible.

import numpy as np
from multiprocessing.pool import Pool

def f(reps, seed):
    np.random.seed(seed)
    x = np.random.uniform()
    return x*x    

#np.random.seed(1)
if __name__ == "__main__":
    processes = 3
    p = Pool(processes)
    print(p.starmap(f, zip(range(6), range(6))))

Where the second argument is the vector of seeds (to see change the line to print(p.starmap(f, zip(range(0,6), np.repeat(1,6)))))

multithreading Pool resulting in non-random numbers

1 Answers1