Which Python multiprocessing inputs resulted in a timeout?

Question

I'm trying to understand multiprocessing in Python with timeouts.

import multiprocessing
import time
import random

def do_something(i):
    x = random.randint(1, 3)
    time.sleep(x)    
    return (i, x)

pool = multiprocessing.Pool(processes=4)
results = pool.imap_unordered(do_something, range(50))

while True:
    try:
        result = results.next(timeout=1)
        print result

    except StopIteration:
        break

    except multiprocessing.TimeoutError:
        print "Timeout"

Questions:

1) How can I set a timeout on each task, such that I can know which inputs resulted in a timeout? In the example above it seems the result is printed even if the task times out, so I don't know which task is slow.

2) Why don't all tasks that sleep longer than 1 second timeout? I only get a handful of timeouts, despite roughly 2/3 random x's greater than 1.

Tim Peters · Accepted Answer · 2016-07-15T02:12:32.417

I don't really understand the first question. When TimeoutError is raised, no remaining work item finished in time - but that's obvious ;-)

About "it seems the result is printed even if the task times out": The tasks never time out - it's only waiting for a result that can time out. If no result is ready within a second, you loop around and try again. Eventually every work item does complete, so of course a result is eventually printed for every work item. You could change timeout=1 to, e.g., timeout=0.01, and nothing about that would change (except you'd see Timeout printed more often). TimeoutError doesn't mean a task timed out - it only means that the main program timed out waiting for a task to finish. The tasks keep running.

For the second question, think about it more. Suppose, e.g., you start with 4 processes with sleep times of 1, 1, 3, and 3.

You wait a second and the first result is ready just before the 1-second timeout expires. The sleep time remaining in the other 3 processes decrease to 0, 2, and 2. While you're printing the first result, perhaps the first process starts a new work item with a sleep time of 1. So now the remaining wait times across all processes are 1, 0, 2, 2. In fact the 2nd process is already working on another new item, so the wait times remaining are 1, n, 2, 2 for some value of n.

The loop goes around and picks up the result from the 2nd process immediately. The wait times across processes are now a little less than 1, n, 2, 2.

So waiting a second again picks up the result from the first process before the timeout, and the sleep times on the 3rd and 4th processes simultaneously fall below a second each.

And so on. Waiting a second for a result takes a second off each process's remaining sleep time simultaneously, because they're running concurrently.

I bet you'd see the behavior you expected if you changed the Pool constructor to say processes=1. Then you'll see at least one timeout every time a process picks a sleep time of 2, and you'll see at least two timeouts every time a process picks a sleep time of 3. But when you're running 4 processes simultaneously, their remaining sleep times all decrease simultaneously.

Clearer?

Great answer! Thanks! I realize I was asking the wrong question. I actually want to have a task time out if it takes too long. Is there a way to achieve that? — Bogdan Vasilescu, Jul 15 '16 at 04:21
`Pool` has nothing like that built in. You can roll your own out of a lower- level `multiprocessing.Process()` call, followed by a `.join()` with a timeout, followed by a `.terminate()`. Or you can create a thread in the worker process to sleep and, e.g., do an `os._exit()` when it wakes up. Or ... there is truly no _graceful_ way to force code to stop. Several approaches fleshed out here: http://stackoverflow.com/questions/492519/timeout-on-a-function-call — Tim Peters, Jul 15 '16 at 04:29

Which Python multiprocessing inputs resulted in a timeout?

1 Answers1