5

I implemented a function that uses the numpy random generator to simulate some process. Here is a minimal example of such a function:

def thread_func(cnt, gen):
    s = 0.0
    for _ in range(cnt):
        s += gen.integers(6)
    return s

Now I wrote a function that uses python's starmap to call the thread_func. If I were to write it like this (passing the same rng reference to all processes):

from multiprocessing import Pool
import numpy as np    
def evaluate(total_cnt, thread_cnt):
        gen = np.random.default_rng()
        cnt_per_thread = total_cnt // thread_cnt
        with Pool(thread_cnt) as p:
            vals = p.starmap(thread_func, [(cnt_per_thread,gen) for _ in range(thread_cnt)])
        return vals

The result of evaluate(100000, 5) is an array of 5 same values, for example:

[49870.0, 49870.0, 49870.0, 49870.0, 49870.0]

However if I pass a different rng to all processes, for example by doing:

vals = p.starmap(thread_func, [(cnt_per_thread,np.random.default_rng()) for _ in range(thread_cnt)])

I get the expected result (5 different values), for example:

[49880.0, 49474.0, 50232.0, 50038.0, 50191.0]

Why does this happen?

  • Please include the imports. I guess `Pool` is `multiprocessing.Pool`, so you have multiple processes not threads? – Michael Szczesny May 20 '22 at 11:47
  • Yes, that is correct. I used processes, not threads. I have corrected the post now to include the imports and not refer to processes as "threads". – evolved_antenna May 20 '22 at 16:53

1 Answers1

2

TL;DR: as pointed out by @MichaelSzczesny, the main problem appear that you use processes which operate on a copy of the same RNG object having the same initial state.


Random number generator (RNG) objects are initialized with an integer called a seed which is modified when a new number is generated using an iterative operation (eg. (seed * huge_number) % another_huge_number).

It is not a good idea to use the same RNG object for multiple threads operations on it are inherently sequential. In the best case, if two threads accesses it in a protected way (eg. using critical sections), the result is dependent of the ordering of the thread. Additionally, performance is affected since doing that cause an effect called cache line bouncing slowing down the execution of the threads accessing to the same object. In the worst case, the RNG object is unprotected and this cause a race condition. Such an issue cause the seed to be possibly the same for multiple threads and so the result (that was supposed to be random).

CPython uses giant mutex called the global interpreter lock (GIL) that protects access to Python objects. It prevents multiple threads from executing Python bytecodes at once. The goal is to protect the interpreter but not the object state. Many function of Numpy release the GIL so the code can scale in parallel. The thing is it cause race condition if you use them from the same thread. It is your responsibility to use locks to protect Numpy objects.

In your case, I cannot reproduce the problem with thread but I can with processes. Thus, I think you use processes in your example. For processes, you should use:

from multiprocessing import Pool

And for threads you should use:

from multiprocessing.pool import ThreadPool as Pool

Processes behave differently from threads because they do not operate on shared objects (at least not by default). Instead, processes operates on object copies. Processes produce the same output since the initial state of the RNG object is the same in all processes.

Put it shortly, please use one different RNG per thread. A typical solution is to create N threads with they own RNG object and then communicate with them to send some work (eg. using queues). This is called a thread pool. An alternative option might be to use thread local storage.

Note that the Numpy documentation provides an example in Section Multithreaded Generation.

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
  • I don't understand why this would cause the results of all threads (calling `rng.integers` 20000 times) to be exactly the same. – Michael Szczesny May 20 '22 at 11:43
  • 1
    I'm not sure if the code uses processes or threads and if it would make any difference. – Michael Szczesny May 20 '22 at 11:50
  • I agree for 20_000 times, it is suspicious. The race condition can cause few values to be the same but not 20_000. I cannot reproduce the problem, on my machine with threads. But I can with processes. So the OP certainly confused threads and processes. – Jérôme Richard May 20 '22 at 11:54