I want to create the following flow using celery configuration\api:
- Send TaskA(argB) Only if celery queue has no TaskA(argB) already pending
Is it possible? how?
You can make your job aware of other tasks by some sort of memoization. If you use a cache control key (redis, memcached, /tmp, whatever is handy), you can make execution depend on that key. I'm using redis as an example.
from redis import Redis
@app.task
def run_only_one_instance(params):
try:
sentinel = Redis().incr("run_only_one_instance_sentinel")
if sentinel == 1:
#I am the legitimate running task
perform_task()
else:
#Do you want to do something else on task duplicate?
pass
Redis().decr("run_only_one_instance_sentinel")
except Exception as e:
Redis().decr("run_only_one_instance_sentinel")
# potentially log error with Sentry?
# decrement the counter to insure tasks can run
# or: raise e
I cannot think of a way but to
Retrieve all executing and scheduled tasks via celery inspect
Iterate through them to see if your task is there.
check this SO question to see how the first point is done.
good luck
I don't know it's gonna help you more than the other answers, but there goes my approach, following the same idea given by srj. I needed a way to block my server to launch tasks with same id to queue. So I made a general function to help me.
def is_task_active_or_registered(app, task_id):
i = app.control.inspect()
active_dict = i.active()
scheduled_dict = i.scheduled()
keys_set = set(active_dict.keys() + scheduled_dict.keys())
tasks_ids_set = set()
for _dict in [active_dict, scheduled_dict]:
for k in keys_set:
for task in _dict[k]:
tasks_ids_set.add(task['id'])
if task_id in tasks_ids_set:
return True
else:
return False
So, I use it like this:
In the context where my celery-app object is available, I define:
def check_task_can_not_run(task_id):
return is_task_active_or_registered(app=celery, task_id=task_id)
And so, from my client request, I call this check_task_can_not_run(...)
and block task from being launched in case of True
.
My answer is on assuming
bind=True
to the task so that we can get task uuid as self.request.id
from redis import Redis
@app.task(bind=True)
def your_task(self, *args, **kwargs):
task_id = self.request.id
# check if already executed
redis_conn = Redis.from_url(redis_url, charset="utf-8", decode_responses=True)
if redis_conn.get(task_id):
logger.info(f"{task_id} is a duplicate job")
raise Exception('Duplicate job') # Handle this accordingly
# expire in 1 day
ex_1_day = 24*60*60
redis_conn.set(task_id, 1, ex=ex_1_day)
# From here run your job now
# ...
Thus this way, it will ensure that your job never runs parallely neither run sequencially duplicately
I was facing similar problem. The Beat was making duplicates in my queue. I wanted to use expires
but this feature isn't working properly https://github.com/celery/celery/issues/4300.
So here is scheduler which checks if task has been already enqueued (based on task name).
# -*- coding: UTF-8 -*-
from __future__ import unicode_literals
import json
from heapq import heappop, heappush
from celery.beat import event_t
from celery.schedules import schedstate
from django_celery_beat.schedulers import DatabaseScheduler
from typing import List, Optional
from typing import TYPE_CHECKING
from your_project import celery_app
if TYPE_CHECKING:
from celery.beat import ScheduleEntry
def is_task_in_queue(task, queue_name=None):
# type: (str, Optional[str]) -> bool
queues = [queue_name] if queue_name else celery_app.amqp.queues.keys()
for queue in queues:
if task in get_celery_queue_tasks(queue):
return True
return False
def get_celery_queue_tasks(queue_name):
# type: (str) -> List[str]
with celery_app.pool.acquire(block=True) as conn:
tasks = conn.default_channel.client.lrange(queue_name, 0, -1)
decoded_tasks = []
for task in tasks:
j = json.loads(task)
task = j['headers']['task']
if task not in decoded_tasks:
decoded_tasks.append(task)
return decoded_tasks
class SmartScheduler(DatabaseScheduler):
"""
Smart means that prevents duplicating of tasks in queues.
"""
def is_due(self, entry):
# type: (ScheduleEntry) -> schedstate
is_due, next_time_to_run = entry.is_due()
if (
not is_due or # duplicate wouldn't be created
not is_task_in_queue(entry.task) # not in queue so let it run
):
return schedstate(is_due, next_time_to_run)
# Task should be run (is_due) and it is present in queue (is_task_in_queue)
H = self._heap
if not H:
return schedstate(False, self.max_interval)
event = H[0]
verify = heappop(H)
if verify is event:
next_entry = self.reserve(entry)
heappush(H, event_t(self._when(next_entry, next_time_to_run), event[1], next_entry))
else:
heappush(H, verify)
next_time_to_run = min(verify[0], next_time_to_run)
return schedstate(False, min(next_time_to_run, self.max_interval))