5

Does the scope of a numpy ndarray function differently within a function called by multiprocessing? Here is an example:

Using python's multiprocessing module I am calling a function like so:

for core in range(cores):
    #target could be f() or g()
    proc = mp.Process(target=f, args=(core))
    jobs.append(proc)
for job in jobs:
    job.start()
for job in jobs:
    job.join()

def f(core):
    x = 0
    x += random.randint(0,10)
    print x

def g(core):
    #Assume an array with 4 columns and n rows
    local = np.copy(globalshared_array[:,core])
    shuffled = np.random.permutation(local)

Calling f(core), the x variable is local to the process, ie. it prints a different, random integer as expected. These never exceed 10, indicating that x=0 in each process. Is that correct?

Calling g(core) and permuting a copy of the array returns 4 identically 'shuffled' arrays. This seems to indicate that the working copy is not local the child process. Is that correct? If so, other than using sharedmemory space, is it possible to have an ndarray be local to the child process when it needs to be filled from shared memory space?

EDIT:

Altering g(core) to add a random integer appears to have the desired effect. The array's show a different value. Something must be occurring in permutation that is randomly ordering the columns (local to each child process) the same...ideas?

def g(core):
    #Assume an array with 4 columns and n rows
    local = np.copy(globalshared_array[:,core])
    local += random.randint(0,10)

EDIT II: np.random.shuffle also exhibits the same behavior. The contents of the array are shuffling, but are shuffling to the same value on each core.

Jzl5325
  • 3,898
  • 8
  • 42
  • 62
  • 1
    I think it may just be that random number generator + threads = possible trouble (ie. you get lucky once and once not). You may just have to make sure you initialize a random state for each one separately or such `np.random.RandomState`. – seberg Jan 24 '13 at 15:38
  • @seberg I also tired using `np.random.random_integer`, but the returned arrays are identical across processes. Are you referencing `np.random.mtrand.RandomState? So instantiate a random class for each child process because they may be instantiating one class or overwriting each other? – Jzl5325 Jan 24 '13 at 15:48
  • Not sure when they get instantiated, but I am not certain if they cannot be by chance instantiated exactly the same (they probably use system time), or maybe just copy the same state on forking, and should be instantiated again. – seberg Jan 24 '13 at 15:50
  • In short, at least call `np.random.seed()` once. – seberg Jan 24 '13 at 16:00
  • `np.random.mtrand.RandomState` exhibits the same issue. The docs look like the class uses the machine clock. Same issue with seed(). Looks like I either need to insert a delay or accept the 'randomness'. – Jzl5325 Jan 24 '13 at 16:01
  • anyway, I may be thinking the wrong way, but maybe just to make sure seed it with the process ID or such that is unique for each process... – seberg Jan 24 '13 at 16:03
  • I tried something similar. I seeded with a randomint between 0 and 1000. It is hard to believe that these processes are spawning at *exactly* the same clock time. – Jzl5325 Jan 24 '13 at 16:05
  • @Jzl5325 They don't have to spawn at same time to have the same seed. It depends on the clock resolution. Also on multi-core cpus it is possible to have processes spawning at the same instant. – Bakuriu Jan 24 '13 at 16:15

2 Answers2

5

Calling g(core) and permuting a copy of the array returns 4 identically 'shuffled' arrays. This seems to indicate that the working copy is not local the child process.

What it likely indicates is that the random number generator is initialized identically in each child process, producing the same sequence. You need to seed each child's generator (perhaps throwing the child's process id into the mix).

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • This wandered into the comments of the OP. Seeding with the pid or a `randomint(0,1000)` did not alter the random solution. – Jzl5325 Jan 24 '13 at 16:12
  • 3
    @Jzl5325 seeding with a random integer is pretty pointless when the state is the same... please try seeding with the pid inside the function `np.random.seed(pid)` or such? – seberg Jan 24 '13 at 16:14
  • I want to share numpy random state of a parent process with a child process. I've tried using Manager but still no luck. Could you please take a look at my question [here](https://stackoverflow.com/questions/49372619/how-to-share-numpy-random-state-of-a-parent-process-with-child-processes) and see if you can offer a solution? I can still get different random numbers if I do np.random.seed(None) every time that I generate a random number, but this does not allow me to use the random state of the parent process, which is not what I want. Any help is greatly appreciated. – Amir Mar 20 '18 at 14:14
4

To seed a random array this post was most useful. The following g(core) function succeeded in generating a random permutation for each core.

def g(core):
    pid = mp.current_process()._identity[0]
    randst = np.random.mtrand.RandomState(pid)
    randarray = randst.randint(0,100, size=(1,100)
Community
  • 1
  • 1
Jzl5325
  • 3,898
  • 8
  • 42
  • 62