Trying to parallelise a nested for-loop and save intermediate results

Question

I am completely new to parallelisation. I would like to parallelise a nested for-loop and store some intermediate results. The results come from a function f that takes some formal parameters and some values from global variables. I got some suggestions from here, for example I use the itertools to produce a cartesian product which is equivalent to a nested loop. But it doesn't seem to work. The array where I want to store the intermediate results stays unchanged. A minimal working example is attached.

OS: Windows 7 64 Bit

Python Distribution: Canopy Enthought

import itertools
import numpy as np
from multiprocessing import Pool

list1 = range(4, 8)
list2 = range(6, 9)
ary = np.zeros( (len(list1), len(list2)) )

#This is the archetypical function f. It DOES NOT have p2 as a parameter! This
#is intended! In my (more complex) program a function f calls somewhere deep
#down another function that gets its values from global variables. Rewriting
#the code to hand down the variables as parameters would turn my code into a mess.
def f(p1):
    return p1*p2

#This is what I want to parallelize: a nested loop, where the result of f is saved
#in an array element corresponding to the indices of p1 and p2.
#for p1 in list1:
#    for p2 in list2:
#        i = list1.index(p1)
#        j = list2.index(p2)
#        ary[i,j]=f(p1)

#Here begins the try to parallelize the nested loop. The function g calls f and
#does the saving of the results. g takes a tuple x, unpacks it, then calculates
#f and saves the result in an array.
def g(x):
    a, b = x
    i = list1.index(a)
    j = list2.index(b)
    global p2
    p2 = b
    ary[i,j] = f(a)

if __name__ == "__main__":
    #Produces a cartesian product. This is equivalent to a nested loop.
    it = itertools.product(list1, list2)
    pool = Pool(processes=2)
    result = pool.map(g, it)
    print ary
    #Result: ary does not change!

score 1 · Accepted Answer · answered Mar 25 '14 at 22:34

1

Through the use of Pool, your program is somehow copied the number of processes times, each of them having its own global variables. When your computation returns, the global variable of your master process didn't change. You should use the return values of your function, that you call in parallel and combine the results, that means, use your variable result from line

result = pool.map(g, it)

In your case it only contains a list of Nones so far.

A general hint for parallelization: Always use pure computations, that means, don't rely on side effects like global variables.

answered Mar 25 '14 at 22:34

spiehr

411
2
10

Ok, so you say I should not rely on the "globalness" of ary. Good, I replaced in the definition of g() ary[i,j] = f(a) with `return (i, j, f(a))` Then results does not contain 'None' and I can construct ary from the result. I define a function `def constr(x): i, j, val = x ary[i,j] = val` and then I can map construct on results and get ary. It works! Thank you! About globalness: can the lines `global p2 p2 = b` can cause a race condition, i.e. one processes gets a 'wrong' value for p2 because another process changed its (global value)? – RogueDodecahedron Mar 25 '14 at 23:00
@David: would be nice if you accept my answer. To your comment: I thought, that this does not happen, but again, there is a `global`, try to rewrite that. – spiehr Mar 25 '14 at 23:04
I did a test: I set p2 = 0 at the beginning of my code. Then the parallel part was executed. ary was constructed correctly and the (global) value of p2 stayed 0. I suppose that every process gets a copy of list1, list2, ary, p2, etc. and can modify it without modifying the original one or the copies of other processes. It seems to work but it's still a little magic to me. I can't rewrite my real code without the 'global' because the functions that need these global values are nested deep down within other functions. Rewriting the existing code would be a nightmare that exceeds its benefits. – RogueDodecahedron Mar 25 '14 at 23:21
You can move all the imports to the __main__ block. This will make clearer that each process runs the first part of script, and only the master one runs that last block. So each process makes its own copy of list1, list2 etc. All it gets from the `pool` is the input to `g`. – hpaulj Mar 25 '14 at 23:36

score 0 · Answer 2 · answered Mar 25 '14 at 22:34

0

You need to share information between the processes using some sort of mechanism. Look at multiprocessing.queue for instance.

If you want to use shared memory, you need to use threading instead. You may find that while the GIL does impact threading performance you may still be able to run numpy commands in parallel.

answered Mar 25 '14 at 22:34

Clarus

2,259
16
27

Threading is not needed for shared memory. Shared memory is possible with multiple processes as well. E.g. mmap.mmap from the paging file (fd 0 on Windows or -1 on Linux), mmap from /tmp on Linux, use SysV shm_get or POSIX shm_open. Once you have a buffer, it can be used to support a NumPy array that points to shared memory (numpy.frombuffer). multiprocessing.Array can also be used as shared memory buffer with NumPy. – Sturla Molden Mar 26 '14 at 13:23
Isn't that kinda like killing a fly with a sledgehammer? Threads, Queues, and so on all have synchronization methods which was the original question. – Clarus Mar 26 '14 at 16:20
Using multiprocessing.Array as buffer for the output array would have solved the issue. It is not updated because it is local to the process. Placing it in shared memory is a possible remedy. Threading is not useable for parallelization of compute tasks because of the GIL. Python threads are ok for parallelization of i/o operations that block, but not CPU intensive work. – Sturla Molden Mar 26 '14 at 16:31

Trying to parallelise a nested for-loop and save intermediate results

2 Answers2