requests.get hangs when called in a multiprocessing.Pool

Question

I have the following code:

def process_url(url):
    print '111'
    r = requests.get(url)
    print '222' # <-- never even gets here
    return


urls_to_download = [list_or_urls]
PARALLEL_WORKERS = 4

pool = Pool(PARALLEL_WORKERS)
pool.map_async(process_url, urls_to_download)
pool.close()
pool.join()

Every time I do this, it runs the first four items and then just hangs. I don't think it's a timeout issue, as it is extremely fast to download the four urls. It is just after fetching those first four it hangs indefinitely.

What do I need to do to remedy this?

What happens if you simply call your process_url() function from a single process/script or from within the interactive Python shell? Seems pretty obvious that the problem lies somewhere within requests.get() and likely that it's blocking on some connection or input (read/fetch) operation. What happens if you use a known accessible URL? For example https://www.google.com/?#q=current+time — Jim Dennis, Sep 20 '14 at 00:08
Try adding an explicit return. Also I personally would recommend against using any print or other console I/O from the child processes. The better programming model is to have the parent process as a dispatcher and console I/O controller and have the children/workers return their results via threads (or, possibly, by having them all write to some DB or other coherent data store). — Jim Dennis, Sep 20 '14 at 00:11
I don't see anything particularly wrong with the code that would do as you describe besides not having a mutex around the prints. What platform? Windows? *nix? — Michael Petch, Sep 20 '14 at 00:20
You say "multithreading", is this the multiprocessing library? If so, try changing map_async to just map. If you use map_async you need to get its returning result object and either wait or get results on it. Since you don't wait, you close the pool before its had time to do its work. — tdelaney, Sep 20 '14 at 00:31
@tdelaney The tasks will still run until completion, even if `close`/`join` are called immediately after `map_async`. `close` just means "No more tasks can be submitted", and `join` means "Wait until all pending tasks are completed". — dano, Sep 20 '14 at 00:37
@user1436531 I can't seem to reproduce this on Windows or Linux. Does this happen with any set of URLs you give it? — dano, Sep 20 '14 at 00:47
@David542 Has this problem been solved? I am facing the same issue. — user2552108, Feb 28 '18 at 04:55
It’s an old issue but it can be helpful for future readers. I ran into the same issue when using the PyCharm on Mac with Anaconda python 3.7. If the “Debug” is pressed to execute then it will stuck at the requests.geturl() but if “Run" is pressed it will run without any issues. The PyCharm has bug. — smm, Jan 29 '21 at 19:03

score 1 · Answer 1 · answered Jun 28 '22 at 23:33

The problem

Even though this question uses python 2, you can still reproduce this "error" in python 3. This is happening because pool.async_map returns an object of class AsyncResult. To receive the result (or traceback in case of error) of the async_map call, you need to use get(). Joining the pool will not work here since the job has already been completed, with the result being an AsyncResult which acts similar to a Promise.

So, what's the solution?

Simply, add a call to wait for the result to be received:

from multiprocessing import Pool
import requests

def process_url(url):
    print('111')
    r = requests.get(url)
    print('222') # <-- never even gets here (not anymore!)
    return


if __name__ == "__main__":
    urls_to_download = ['https://google.com'] * 4
    PARALLEL_WORKERS = 4

    pool = Pool(PARALLEL_WORKERS)
    a = pool.map_async(process_url, urls_to_download)
    
    # Add call here
    a.get()

    pool.close()
    pool.join()

Output

requests.get hangs when called in a multiprocessing.Pool

1 Answers1

The problem

So, what's the solution?

Linked