Multiprocessing nested python loops

Question

To improve my code which has one heavy loop I need a speed up. How can I implement multiprocessing for a code like this? (a is typical of size 2 and l up to 10)

for x1 in range(a**l):
    for x2 in range(a**l):
        for x3 in range(a**l):
            output[x1,x2,x3] = HeavyComputationThatIsThreadSafe1(x1,x2,x3)

ShadowRanger's comment on [your other question](http://stackoverflow.com/q/37081288/1461210) still stands - all the threads in the world are not going to make much of a dent if you're committed to calling `HeavyComputationThatIsThreadSafe1` *over a billion times*. How many seconds does a single call to `HeavyComputationThatIsThreadSafe1` take? Take that number, multiply it by 1073741824 and divide by the number of cores you have. That gives you the absolute best-case scenario runtime you could achieve with multiprocessing. — ali_m, May 08 '16 at 01:24
I addressed the performance problems with the `HeavyComputationThatiIsThreadSafe` in [the original question](http://stackoverflow.com/a/37100607/392949) you linked to. Even with the data size you mention, it only takes ~8GB of memory and 45s to go over all three nested loops, if you take some reasonable optimization sets. — JoshAdel, May 08 '16 at 13:58

JoshAdel · Accepted Answer · 2016-05-07T19:25:08.960

If the HeavyComputationThatIsThreadSafe1 function only uses arrays and not python objects, I would using a concurrent futures (or the python2 backport) ThreadPoolExecutor along with Numba (or cython) with the GIL released. Otherwise use a ProcessPoolExecutor.

See:

http://numba.pydata.org/numba-doc/latest/user/examples.html#multi-threading

You'd want to parallelize the calculation at the level of the outermost loop and and then fill output from the chunks resulting from each thread/process. This assumes the cost of doing so is much cheaper than the computation, which should be the case.

Multiprocessing nested python loops

1 Answers1

Linked