12

How do you prevent Celery from executing a periodic task before the previous execution has completed?

I have a cluster of servers, linked to a common database server, executing Celery tasks, and I'm finding each server may occasionally run the same task simultaneously as well as different servers running that same task simultaneously. This is causing a lot of race conditions that are corrupting my data in painfully subtle ways.

I've been reading through Celery's docs, but I can't find any option that explicitly allows this. I found a similar question, but the suggested fix seems like a hack, as it relies on Django's caching framework, and therefore might not be shared by all servers in a cluster, allowing multiple servers to still execute the same task at the same time.

Is there any option in Celery to record what tasks are currently running in the database, and don't run again until the database record is cleared?

I'm using the Django-Celery module, and even though it provides pages /admin/djcelery/taskstate/ and /admin/djcelery/workerstate/, I've never seen any long-running tasks or workers show up there.

Community
  • 1
  • 1
Cerin
  • 60,957
  • 96
  • 316
  • 522

2 Answers2

4

Standard way is to use shared lock via django standard cache mechanism. See this recipe from official documentation

ento
  • 5,801
  • 6
  • 51
  • 69
Alexander Lebedev
  • 5,968
  • 1
  • 20
  • 30
  • And like I mention, that's not a robust mechanism in a cluster setting... Why is there no option that uses the database? – Cerin Mar 28 '12 at 22:48
  • Use memcached backend and you'll get cluster functionality – Alexander Lebedev Mar 28 '12 at 22:50
  • 1
    @AlexLebedev, that's a good point, but *if and only if* the machines in the cluster share the backend. For example, it isn't unthinkable to run memcached locally and use a localhost memcached backend on each box. Logically obvious but I just wanted to point it out lest anyone think "oh, I'm using memcached, problem solved." – mrooney Apr 02 '14 at 23:51
2

If I were you I'd setup a special queue for any jobs that can't be executed simultaneously. Then you can simply startup a separate worker just for that queue.

BenH
  • 894
  • 8
  • 7