4

So, I'm working on an application that has to check ~50 GB of data against a list of hashes every time it starts up. Obviously this needs to be parallelized, and I don't want the application hanging on a "LOADING..." screen for a minute and a half.

I'm using multiprocessing.Pool's map_async to handle this; the main thread calls map_async(checkfiles, path_hash_pairs, callback) and provides a callback that tells it to throw up a warning if mismatches are found.

Trouble is... nothing happens. Looking at the Python processes with my task manager shows they spawn and then immediately terminate without doing any work. They never print anything and certainly never finish and call the callback.

This minified example also exhibits the same problem:

def printme(x):
    time.sleep(1)
    print(x)
    return x**2

if __name__ == "__main__":
    l = list(range(0,512))

    def print_result(res):
        print(res)

    with multiprocessing.Pool() as p:
        p.map_async(printme, l, callback=print_result)
    p.join()
    time.sleep(10)

Run it, and... nothing happens. Swapping map_async for map works exactly as expected.

Am I just making a stupid mistake or what?

Schilcote
  • 2,344
  • 1
  • 17
  • 35
  • How did you apply context manager to the Pool? Is it something new I am not aware of? If I run your code I get: `AttributeError: __exit__`, however if I manage pool manually (map_async, close) everything works just fine. – taras Oct 20 '17 at 05:57

1 Answers1

8

Let's see what happen:

You are using a context manager to automatically "close" the Pool, but, what is important, if you check Pool.__exit__'s source code, you will find:

def __exit__(self, exc_type, exc_val, exc_tb):
    self.terminate()

It just call terminate instead of close. So you still need to explicitly close Pool then join it.

with multiprocessing.Pool() as p:
    p.map_async(printme, l, callback=print_result)
    p.close()
    p.join()

But in this case, using context manager is meaningless, just use a normal form:

p = multiprocessing.Pool()
p.map_async(printme, l, callback=print_result)
p.close()
p.join()

And why it works with map? Because map will block util all works are finished.

Sraw
  • 18,892
  • 11
  • 54
  • 87
  • thank you!! This was driving me nuts. Now I'll try to figure out from your last comment why it _doesn't_ work with `apply_async`... – Stefano Apr 11 '22 at 17:44
  • But this will turn the async to a dummy sync execution...i.e from a non-blocking to blocking code ? – jossefaz Sep 29 '22 at 09:54