How to implement a reduce operation in python multiprocessing?

Question

I'm an expert parallel programmer in OpenMP and C++. Now I'm trying to understand parallelism in python and the multiprocessing library.

In particular, I'm trying to parallelize this simple code, which randomly increment an array for 100 times:

from random import randint
import multiprocessing as mp
import numpy as np

def random_add(x):
    x[randint(0,len(x)-1)]  += 1

if __name__ == "__main__":
    print("Serial")
    x = np.zeros(8)
    for i in range(100):
        random_add(x)
    print(x)

    print("Parallel")
    x = np.zeros(8)    
    processes = [mp.Process(target = random_add, args=(x,)) for i in range(100)]
    for p in processes:
        p.start()
    print(x)

However,this is the following output:

Serial
[  9.  18.  11.  15.  16.   8.  10.  13.]
Parallel
[ 0.  0.  0.  0.  0.  0.  0.  0.]

Why this happens? Well, I think I have an explanation: since we are in multiprocessing (and not multi-threading), each process as his own section of memory, i.e., each spawned process has his own x, which is destroyed once random_add(x) is terminated. As conclusion, the x in the main program is never really updated.

Is this correct? And if so, how can I solve this problem? In a few words, I need a global reduce operation which sum the results of all the random_add calls, obtaining the same result of the serial version.

You are correct in your diagnosis. See [some workarounds](https://docs.python.org/3.6/library/multiprocessing.html#sharing-state-between-processes) in the multiprocessing library. — juanpa.arrivillaga, Sep 20 '17 at 06:40
sub-process get a copy of your variable `x` but not the original one. So this is a common question about sharing variables in multiprocessing. Just search it as there is a looooot of information existed. — Sraw, Sep 20 '17 at 06:55
Thanks for both comments. This also means that if `x` is potentially big, the code is gonna take forever? Just to understand the process :) — justHelloWorld, Sep 20 '17 at 07:05

score 3 · Accepted Answer · answered Sep 20 '17 at 07:04

3

You should use shared memory objects in your case:

from random import randint
import multiprocessing as mp

def random_add(x):
    x[randint(0,len(x)-1)]  += 1

if __name__ == "__main__":
    print("Serial")
    x = [0]*8
    for i in range(100):
        random_add(x)
    print(x)

    print("Parallel")
    x = mp.Array('i', range(8))
    processes = [mp.Process(target = random_add, args=(x,)) for i in range(100)]
    for p in processes:
        p.start()
    print(x[:])

I've changed numpy array to ordinal list for the purpose of clearness of code

answered Sep 20 '17 at 07:04

Roman Mindlin

852
1
8
12

Thanks for your answer. What if I have an `Array()` of numpy arrays? So `x[randint(0, len(x)-1]` is actually a `numpy.array`. I think that this will be safe, but a second opinion is always useful :D – justHelloWorld Sep 20 '17 at 08:19
2

Read [this answer](https://stackoverflow.com/questions/7894791/use-numpy-array-in-shared-memory-for-multiprocessing) about numpy arrays and shared memory – Roman Mindlin Sep 20 '17 at 08:22

How to implement a reduce operation in python multiprocessing?

1 Answers1

Linked