2

According to this answer, it isn't. But this has not been consistend with what I've observed so far. Consider the following script:

import numpy as np
from multiprocessing.dummy import Pool
from queue import Queue

SIZE=1000000
np.random.seed(1)
tPool = Pool(100)
q1 = Queue()

def worker_thread(i):
    q1.put(np.random.choice(100, 5))

tPool.map(worker_thread, range(SIZE))

q2 = Queue()
np.random.seed(1)
for i in range(SIZE):
    q2.put(np.random.choice(100, 5))

n = 0
for i in range(SIZE):
    n += (q1.get() == (q2.get()))

print(n)

Basically what I'm testing here is if SIZE number of calls will generate the same sequence in the multi-threaded environment as in the single-threaded environment. For me this will output n=SIZE. Of course this could be just chance, so I ran it a few times and been having consistent results. So my question is, are calls to functions of the numpy.random package thread-safe?

spurra
  • 1,007
  • 2
  • 13
  • 38

1 Answers1

2

I've run your script several times on my machine and got arrays of 999995, 999992 nearly as often as 1000000 (python 3.5.2, numpy 1.13.3). So the answer you're referring to is correct: np.random may produce a different result in multi-threaded environment.

You can see it yourself if you increase the pool size, say to 1000, and sample size, say to 50. I was able to achieve 100% inconsistency even for a smaller SIZE=100000.

Maxim
  • 52,561
  • 27
  • 155
  • 209
  • Hmm, this is weird. I've tested it on three different machines, running python 3.5.2 and numpy 1.13.3 and I always get consistent results, even after increasing the threadpool to 1000, increasing the sample size to 50 and the range of sampling to 10'000. How can this be? – spurra Nov 01 '17 at 08:48
  • Interesting. I'll try it on a different machine too. Looks like native implementation really matter here. Right now I'm using Linux, x86_64, kernel 4.10.0-37-generic. – Maxim Nov 01 '17 at 09:27
  • 1
    @BananaCode I can confirm that `np.random` is indeed stable on my other machine. Tried python 3.5.2 and 3.6.0, numpy 1.13.3 and 1.12.1. This makes me think that it depends on `libc` or even kernel version. But still thread safety is not guaranteed. – Maxim Nov 03 '17 at 11:11
  • Thanks for testing @Maxim. However due to your first experiment, I have migrated my code to using np.random.RandomState, which is supposedly thread-safe. – spurra Nov 03 '17 at 11:57