11

It is unclear how to properly timeout workers of joblib's Parallel in python. Others have had similar questions here, here, here and here.

In my example I am utilizing a pool of 50 joblib workers with threading backend.

Parallel Call (threading):

output = Parallel(n_jobs=50, backend  = 'threading')
    (delayed(get_output)(INPUT) 
        for INPUT in list)

Here, Parallel hangs without errors as soon as len(list) <= n_jobs but only when n_jobs => -1.

In order to circumvent this issue, people give instructions on how to create a timeout decorator to the Parallel function (get_output(INPUT)) in the above example) using multiprocessing:

Main function (decorated):

@with_timeout(10)    # multiprocessing
def get_output(INPUT):     # threading
    output = do_stuff(INPUT)
    return output

Multiprocessing Decorator:

def with_timeout(timeout):
    def decorator(decorated):
        @functools.wraps(decorated)
        def inner(*args, **kwargs):
            pool = multiprocessing.pool.ThreadPool(1)
            async_result = pool.apply_async(decorated, args, kwargs)
            try:
                return async_result.get(timeout)
            except multiprocessing.TimeoutError:
                return
        return inner
    return decorator

Adding the decorator to the otherwise working code results in a memory leak after ~2x the length of the timeout plus a crash of eclipse.

Where is this leak in the decorator?

How to timeout threads during multiprocessing in python?

sudonym
  • 3,788
  • 4
  • 36
  • 61
  • I am the original OP. My inner function employs selenium. For a selenium context, I have found a way to timeout the inner function directly. Depending on your context this may/may not be applicable - please let me know and I will answer directly – sudonym Feb 23 '18 at 08:33
  • Answered under my post. – noxdafox Feb 23 '18 at 10:00

1 Answers1

10

It is not possible to kill a Thread in Python without a hack.

The memory leak you are experiencing is due to the accumulation of threads you believe they have been killed. To prove that, just try to inspect the amount of threads your application is running, you will see them slowly growing.

Under the hood, the thread of the ThreadPool is not terminated but keeps running your function until the end.

The reason why a Thread cannot be killed, is due to the fact that threads share memory with the parent process. Therefore, it is very hard to kill a thread while ensuring the memory integrity of your application.

Java developers figured it out long ago.

If you can run your function in a separate process, then you could easily rely on a timeout logic where the process itself is killed once the timeout is reached.

The Pebble library already offers decorators with timeout.

noxdafox
  • 14,439
  • 4
  • 33
  • 45
  • 1
    Thanks for your input. I have tried to use Pebble and various other decorators to timeout this function (i.e. timeout-decorator 0.4.0). In summary, all of them produce the memory leak. In contrast to your hypothesis, this is not connected to the number of timed-out threads since memory usage increases dramatically within a time where I dont even see timed-out threads. Another solution would be using SIGTERM and SIGALRM but this wont work in windows. My solution for now is to just restart the whole code every n minutes, making sure that all eventually hung threads get restarted, too. – sudonym Feb 02 '18 at 06:30
  • Sorry for the late reply. If your program leaks memory, what you should do is identify the source of the leak. You can have a look at [this](http://tech.labs.oliverwyman.com/blog/2008/11/14/tracing-python-memory-leaks/) post for that. If you cannot prevent the leak, I'd suggest to run your logic in a separate process and set a memory limit via the [resource](https://docs.python.org/3/library/resource.html) facilities. That in combination with a `timeout` should make your service robust enough. – noxdafox Feb 23 '18 at 10:00