9

I have a random walk function, that uses numpy.random to do the random step. The function walk, by itself, works just fine. In parallel, it works as expected in most cases, however in conjunction with multiprocessing, it fails. Why does multiprocessing get it wrong?

import numpy as np

def walk(x, n=100, box=.5, delta=.2):
    "perform a random walk"
    w = np.cumsum(x + np.random.uniform(-delta,delta,n))
    w = np.where(abs(w) > box)[0]
    return w[0] if len(w) else n

N = 10

# run N trials, all starting from x=0
pwalk = np.vectorize(walk)
print pwalk(np.zeros(N))

# run again, using list comprehension instead of ufunc
print [walk(0) for i in range(N)]

# run again, using multiprocessing's map
import multiprocessing as mp
p = mp.Pool()
print p.map(walk, [0]*N)

The results, are typically something like...

[47 16 72  8 15  4 38 52 12 41]
[7, 45, 25, 13, 16, 19, 12, 30, 23, 4]
[3, 3, 3, 3, 3, 3, 3, 14, 3, 14]

The first two methods obviously show randomness, while the latter doesn't. What's going on, so that multiprocessing doesn't get it right?

If you add a sleep so it's a sleepwalk and there's significant delay, multiprocessing still gets it wrong.

However, if you replace the call to np.random.uniform with a non-array method like [(random.random()-.5) for i in range(n)], then it works as expected.

So why doesn't numpy.random and multiprocessing play nice?

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • 3
    my guess is that numpy is seeding its random number generator with the same value every time it is initialized, and thus all of the child processes are on the same point on the randomness graph and will generate the same set of random numbers. try calling ``numpy.random.seed`` and passing it some value that will vary between calls to walk non-deterministically (such as the current system time) at the start of your walk. – aruisdante Jun 21 '14 at 20:36
  • maybe, but then I'd expect all the last row to be the same… and that's not exactly the case. – Mike McKerns Jun 21 '14 at 20:37
  • See also [here](https://groups.google.com/forum/#!topic/briansupport/9ErDidIBBFM) – BrenBarn Jun 21 '14 at 20:39
  • Huh… interesting. And seems to have been asked already. Oops. – Mike McKerns Jun 21 '14 at 20:41
  • @aruisdante: you basically answered my question… not what do I do, but why is it doing what it is doing… so I'd give you the check if I could. – Mike McKerns Jun 21 '14 at 20:48
  • @MikeMcKerns No worries. I only write Answers if either I'm a) 100% sure of the solution, or b) Can test the solution to satisfy (a). Since neither was true, I wrote it as a comment. Glad it helped. – aruisdante Jun 21 '14 at 20:51
  • I had a similar issue and realized it's because `numpy.random` uses the same seed in every process, but `random` does not. Since I was using both, my results were always almost the same but not exactly the same. – endolith Sep 07 '19 at 04:13

1 Answers1

13

What's going on, so that multiprocessing doesn't get it right?

You need to reseed in each process to make sure the pseudo-random streams are independent of one another.

I use os.urandom to generate the seeds.

kentavv
  • 194
  • 1
  • 9
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485