1

I've looked in multiple SO questions about this but still didn't find a solution, and I'm working on it for several days now, so any help would be appreciated!

I'm using celery 4.4.0, SQS as a broker, and Django 2.2, most of my tasks are pretty long (1 to 4 hours).
This is the command to start the workers:

celery worker -A config.celeryconfig:app -Ofair --prefetch-multiplier=1 --max-tasks-per-child=2

And this configuration within my Django config file:

CELERYD_PREFETCH_MULTIPLIER = 1
CELERY_ACKS_LATE = True
task_acks_late = True # I wasn't sure what's the name of the ack late configuration.

BROKER_TRANSPORT_OPTIONS = {
    'polling_interval': 3,
    'region': 'us-east-1',
    'visibility_timeout': 3600 # 1 hour,
}

Scenario - let's say we have 2 workers - 1,2 and each with one sub-process and two tasks a(longer than 1 hours),b,c and I've dispatched them around the same time:
worker 1 picked up task a
worker 1 picked up task c (but not executing)
worker 2 picked up task b
worker 2 finished with task b
worker 2 picked up task c (because of the visibility_timeout)
worker 2 finished up task c
worker 1 finished with task a
worker 1 start with task c
worker 1 finish with task c

So:

  1. Workers are still prefetching tasks.
  2. (much worse) The same task can be executed twice (or even more, if there were more workers and the execution time is longer than the visibility_timeout). It's discussed in details in here https://github.com/celery/celery/issues/4400.

I've realized the solution to the above problems would be disabling the pre-fetching behavior, but so far I wasn't able to achieve that.

I'm so frustrated, so if you have any idea on how to solve this - please let me know!

Few notes:
*I saw it as well in my production env where we have more than 1 subprocess per worker.
*I used to dispatch tasks using countdown (which resulted in ETA tasks), but have disabled it and the issue still persists.

  • I'm not using result_backend, I'm handling everything on a model in the DB that I've created.
    *I don't want to set CONCURRENCY=1 because I want to have one worker per machine with subprocess=CPU cores in the machine.
    *I've tried so many combinations of the celery configurations mentioned above, but none worked (the one listed above is based on this comment - https://stackoverflow.com/a/58958823)
    *Another possible solution which I prefer not to use is increasing the visibility_timeout to a very big number. But this could result in one task committing all the long tasks one after the other and no distribution of them between all the workers.
    *Not sure if related - I'm deploying celery using ElasticBeanstalk on an EC2 machine. *Another possible solution that I'm currently considering - is checking the status of the task (We use Pending/In Progress, etc statuses), and only if it's pending - continue (This might not solve the entire use cases because of a possible race condition but it should solve most of them).
lmaayanl
  • 378
  • 1
  • 2
  • 15

1 Answers1

0

Well, What I understood from the Celery docs that the "CELERYD_PREFETCH_MULTIPLIER" will only change the behaviour to prefetch one task at a time, but it doesn't limit the prefetching functionality. So if you have concurrency > 1, it will still be prefetching 1 task for each worker. Try setting --concurrency=1

How many messages to prefetch at a time multiplied by the number of concurrent processes. The default is 4 (four messages for each process).

Yazan
  • 1
  • Thanks for the answer! To my understanding, concurrency means limiting the subprocesses the worker will have to 1. so instead of running for example two tasks in the same worker using 2 different subprocesses, it will only run 1. But since my machine has 2 cores it can run 2 tasks and it would be a waste of resources, so I prefer not to do that.. – lmaayanl Oct 29 '20 at 07:20