24

I am using the multiprocessor.Pool() module to speed up an "embarrassingly parallel" loop. I actually have a nested loop, and am using multiprocessor.Pool to speed up the inner loop. For example, without parallelizing the loop, my code would be as follows:

outer_array=[random_array1]
inner_array=[random_array2]
output=[empty_array]    

for i in outer_array:
    for j in inner_array:
        output[j][i]=full_func(j,i)

With parallelizing:

import multiprocessing
from functools import partial

outer_array=[random_array1]
inner_array=[random_array2]
output=[empty_array]    

for i in outer_array:
    partial_func=partial(full_func,arg=i)     
    pool=multiprocessing.Pool() 
    output[:][i]=pool.map(partial_func,inner_array)
    pool.close()

My main question is if this is the correct, and I should be including the multiprocessing.Pool() inside the loop, or if instead I should create the pool outside loop, i.e.:

pool=multiprocessing.Pool() 
for i in outer_array:
     partial_func=partial(full_func,arg=i)     
     output[:][i]=pool.map(partial_func,inner_array)

Also, I am not sure if I should include the line "pool.close()" at the end of each loop in the second example above; what would be the benefits of doing so?

Thanks!

shadowprice
  • 617
  • 2
  • 7
  • 14

3 Answers3

55

Ideally, you should call the Pool() constructor exactly once - not over & over again. There are substantial overheads when creating worker processes, and you pay those costs every time you invoke Pool(). The processes created by a single Pool() call stay around! When they finish the work you've given to them in one part of the program, they stick around, waiting for more work to do.

As to Pool.close(), you should call that when - and only when - you're never going to submit more work to the Pool instance. So Pool.close() is typically called when the parallelizable part of your main program is finished. Then the worker processes will terminate when all work already assigned has completed.

It's also excellent practice to call Pool.join() to wait for the worker processes to terminate. Among other reasons, there's often no good way to report exceptions in parallelized code (exceptions occur in a context only vaguely related to what your main program is doing), and Pool.join() provides a synchronization point that can report some exceptions that occurred in worker processes that you'd otherwise never see.

Have fun :-)

Tim Peters
  • 67,464
  • 13
  • 126
  • 132
  • Thanks for your answer! Where in the code should I place the Pool.join() line? – shadowprice Dec 05 '13 at 01:01
  • 11
    After `Pool.close()`. If your main program has nothing better to do after closing the Pool, `Pool.join()` can be (and often is!) the statement immediately following `Pool.close()`. By the way, note that `Pool.map()` **blocks** until the result is ready: the code you've shown will work, but nothing will run in parallel. Look at the docs for `Pool.imap()` and `Pool.imap_unordered()` instead. Open a new question if you get stuck. – Tim Peters Dec 05 '13 at 01:10
1
import itertools
import multiprocessing as mp

def job(params):
    a = params[0]
    b = params[1]
    return a*b

def multicore():
    a = range(1000)
    b = range(2000)
    paramlist = list(itertools.product(a,b))
    print(paramlist[0])
    pool = mp.Pool(processes = 4)
    res=pool.map(job, paramlist)
    for i in res:
        print(i)

if __name__=='__main__':
    multicore()

how about this?

  • Please supply some example input and the resulting output of your programme and explain how it achieves the result desired by the question. – jwpfox Mar 16 '18 at 03:11
-1
import time
from pathos.parallel import stats
from pathos.parallel import ParallelPool as Pool


def work(x, y):
    return x * y


pool = Pool(5)
pool.ncpus = 4
pool.servers = ('localhost:5654',)
t1 = time.time()
results = pool.imap(work, range(1, 2), range(1, 11))
print("INFO: List is: %s" % list(results))
print(stats())
t2 = time.time()
print("TIMER: Function completed time is: %.5f" % (t2 - t1))
Yagmur SAHIN
  • 277
  • 3
  • 3