6

Basically I have a lot of tasks (in batches of about 1000) and execution times of these tasks can vary widely (from less than second to 10 minutes). I know that if a task is executing more than a minute i can kill it. These tasks are steps in optimization of some data mining model (but are independent of each other) and are spending most of the time inside some C extension function so they would not cooperate if I tried to kill them gracefully.

Is there a distributed task queue that fits into that schema --- AFAIK: celery allows aborting tasks that are willing to cooperate. But I might be wrong.

I recently asked similar question about killing hanging functions in pure python Kill hanging function in Python in multithreaded enviorment.

I guess I could subclass celery task so it spawns a new process and then executes its payload aborting it's execution if it takes to long, but then I would be killed by overhead of initialization of new interpreter.

Community
  • 1
  • 1
jb.
  • 23,300
  • 18
  • 98
  • 136

3 Answers3

6

Celery supports time limiting. You can use time limits to kill long running tasks. Beside killing tasks you can use soft limits which enable to handle SoftTimeLimitExceeded exceptions in tasks and terminate tasks cleanly.

from celery.task import task
from celery.exceptions import SoftTimeLimitExceeded

@task
def mytask():
    try:
        do_work()
    except SoftTimeLimitExceeded:
        clean_up_in_a_hurry()
mher
  • 10,508
  • 2
  • 35
  • 27
0

When you revoke a celery task, you can provide it with an optional terminate=True keyword.

task.revoke(terminate=True)

It doesn't exactly fit your requirements since it's not done by the process itself, but you should be able to either extend the task class to be able to commit suicide, or have a reccurring cleanup task or process killing off tasks that have not completed on time.

ilmarinen
  • 4,557
  • 3
  • 16
  • 12
0

Pistil allows multiple process management, including killing uncooperative tasks.

But:

  • it's beta software, even if it powers gunicorn which is reliable
  • I don't know how it handle 1000 processes
  • Communication between process is not included yet, so you'll have to setup your own using for example zeromq

Another possibility is to use the timer signal so it raises an exception in 36000 seconds. But signals are not trigered if somebody acquire the GIL, which you C program might do.

Bite code
  • 578,959
  • 113
  • 301
  • 329