1

To improve my code which has one heavy loop I need a speed up. How can I implement multiprocessing for a code like this? (a is typical of size 2 and l up to 10)

for x1 in range(a**l):
    for x2 in range(a**l):
        for x3 in range(a**l):
            output[x1,x2,x3] = HeavyComputationThatIsThreadSafe1(x1,x2,x3)
HighwayJohn
  • 881
  • 1
  • 9
  • 22
  • ShadowRanger's comment on [your other question](http://stackoverflow.com/q/37081288/1461210) still stands - all the threads in the world are not going to make much of a dent if you're committed to calling `HeavyComputationThatIsThreadSafe1` *over a billion times*. How many seconds does a single call to `HeavyComputationThatIsThreadSafe1` take? Take that number, multiply it by 1073741824 and divide by the number of cores you have. That gives you the absolute best-case scenario runtime you could achieve with multiprocessing. – ali_m May 08 '16 at 01:24
  • I addressed the performance problems with the `HeavyComputationThatiIsThreadSafe` in [the original question](http://stackoverflow.com/a/37100607/392949) you linked to. Even with the data size you mention, it only takes ~8GB of memory and 45s to go over all three nested loops, if you take some reasonable optimization sets. – JoshAdel May 08 '16 at 13:58

1 Answers1

3

If the HeavyComputationThatIsThreadSafe1 function only uses arrays and not python objects, I would using a concurrent futures (or the python2 backport) ThreadPoolExecutor along with Numba (or cython) with the GIL released. Otherwise use a ProcessPoolExecutor.

See:

http://numba.pydata.org/numba-doc/latest/user/examples.html#multi-threading

You'd want to parallelize the calculation at the level of the outermost loop and and then fill output from the chunks resulting from each thread/process. This assumes the cost of doing so is much cheaper than the computation, which should be the case.

JoshAdel
  • 66,734
  • 27
  • 141
  • 140