TL;DR: as pointed out by @MichaelSzczesny, the main problem appear that you use processes which operate on a copy of the same RNG object having the same initial state.
Random number generator (RNG) objects are initialized with an integer called a seed which is modified when a new number is generated using an iterative operation (eg. (seed * huge_number) % another_huge_number
).
It is not a good idea to use the same RNG object for multiple threads operations on it are inherently sequential. In the best case, if two threads accesses it in a protected way (eg. using critical sections), the result is dependent of the ordering of the thread. Additionally, performance is affected since doing that cause an effect called cache line bouncing slowing down the execution of the threads accessing to the same object. In the worst case, the RNG object is unprotected and this cause a race condition. Such an issue cause the seed to be possibly the same for multiple threads and so the result (that was supposed to be random).
CPython uses giant mutex called the global interpreter lock (GIL) that protects access to Python objects. It prevents multiple threads from executing Python bytecodes at once. The goal is to protect the interpreter but not the object state. Many function of Numpy release the GIL so the code can scale in parallel. The thing is it cause race condition if you use them from the same thread. It is your responsibility to use locks to protect Numpy objects.
In your case, I cannot reproduce the problem with thread but I can with processes. Thus, I think you use processes in your example. For processes, you should use:
from multiprocessing import Pool
And for threads you should use:
from multiprocessing.pool import ThreadPool as Pool
Processes behave differently from threads because they do not operate on shared objects (at least not by default). Instead, processes operates on object copies. Processes produce the same output since the initial state of the RNG object is the same in all processes.
Put it shortly, please use one different RNG per thread. A typical solution is to create N threads with they own RNG object and then communicate with them to send some work (eg. using queues). This is called a thread pool. An alternative option might be to use thread local storage.
Note that the Numpy documentation provides an example in Section Multithreaded Generation.