135

I am using 'multiprocess.Pool.imap_unordered' as following

from multiprocessing import Pool
pool = Pool()
for mapped_result in pool.imap_unordered(mapping_func, args_iter):
    do some additional processing on mapped_result

Do I need to call pool.close or pool.join after the for loop?

Bamcclur
  • 1,949
  • 2
  • 15
  • 19
hch
  • 6,030
  • 8
  • 20
  • 24
  • I generally call `pool.join()` then `pool.close()` once I have started all of the pool threads, but I haven't tried using `pool.imap_unordered()` as an iterable. – Bamcclur Jul 08 '16 at 16:40
  • 13
    what's the point of calling join or close? I didn't call them and my code seems to be working fine. However, I'm concerned that not calling those would result in zombie processes or other subtle things. – hch Jul 08 '16 at 16:48

2 Answers2

148

No, you don't, but it's probably a good idea if you aren't going to use the pool anymore.

Reasons for calling pool.close or pool.join are well said by Tim Peters in this SO post:

As to Pool.close(), you should call that when - and only when - you're never going to submit more work to the Pool instance. So Pool.close() is typically called when the parallelizable part of your main program is finished. Then the worker processes will terminate when all work already assigned has completed.

It's also excellent practice to call Pool.join() to wait for the worker processes to terminate. Among other reasons, there's often no good way to report exceptions in parallelized code (exceptions occur in a context only vaguely related to what your main program is doing), and Pool.join() provides a synchronization point that can report some exceptions that occurred in worker processes that you'd otherwise never see.

Community
  • 1
  • 1
Bamcclur
  • 1,949
  • 2
  • 15
  • 19
  • 12
    is it better to call one before the other? – RSHAP Aug 10 '17 at 20:47
  • 9
    It seems that people like to call `pool.close()` first and `pool.join()` second. This allows for you to add work between the `pool.close()` and `pool.join()` that doesn't need to wait for the pool to finish executing. – Bamcclur Aug 10 '17 at 21:27
  • 54
    Just to add to @Bamcclur's comment - it's not just a good idea to call `pool.close()` first, it's actually mandatory. From [the docs](https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool) : One must call `close()` or `terminate()` before using `join()`. – Bogd Oct 08 '17 at 12:24
  • 5
    @Bogd But *why* is it mandatory? Could you answer [this](https://stackoverflow.com/questions/59618300/if-i-want-to-give-more-work-to-my-process-pool-can-i-call-pool-join-before-po?) question, please? – agdhruv Jan 06 '20 at 20:30
65

I had the same memory issue as Memory usage keep growing with Python's multiprocessing.pool when I didn't use pool.close() and pool.join() when using pool.map() with a function that calculated Levenshtein distance. The function worked fine, but wasn't garbage collected properly on a Win7 64 machine, and the memory usage kept growing out of control every time the function was called until it took the whole operating system down. Here's the code that fixed the leak:

stringList = []
for possible_string in stringArray:
    stringList.append((searchString,possible_string))

pool = Pool(5)
results = pool.map(myLevenshteinFunction, stringList)
pool.close()
pool.join()

After closing and joining the pool the memory leak went away.

Delimitry
  • 2,987
  • 4
  • 30
  • 39
  • 1
    i was getting `ERROR: Terminated with signal 15` before i added the cleanup code `pool.close();pool.join();` but after adding that cleanup code I don't get the console messages. so i suspect at least on my version, python 2.7 from C7, that the pool was maybe somehow not cleaning up exactly. – Trevor Boyd Smith Oct 14 '19 at 12:03