3

I'm evaluating celery for my task queuing workflow. My requirements are slightly unique. The system has the notion of projects and each project will have a (potentially large) number of tasks associated with it. I would like the queuing system to dispatch these tasks in a fair way in that one project with very many tasks to process will not starve other projects.

For example, say that ProjectA has 100 tasks associated with it and all of those tasks are submitted simultaneously. The first 5 tasks are pulled off and submitted to the 5 workers. While the first 5 tasks are processing ProjectB is submitted with another 100 tasks. ProjectB should not have to wait for ProjectA to complete all 100 of its tasks in order to get some processing time. Instead, once a worker becomes free, it should process a ProjectB task. Then the next worker to become free should process a ProjectA task, and so forth in a round-robin fashion.

My thought was that I could dynamically create new queues for each project and have all the workers pull from all the queues such as described in this SO post. However, according to this answer celery workers will actually process tasks in the order they were submitted, regardless of the queue they are in (which seems a little peculiar too me). This does not work for me because it would cause starvation of projects submitted after the currently processing one.

Can Celery be used to implement my requirements? If not, is there a recommended best practice to implement my requirements?

Community
  • 1
  • 1
Eddie
  • 919
  • 2
  • 9
  • 21

1 Answers1

3

From my testing, celery CAN be used to implement your requirements, as the queues are processed round-robin style. See my answer in the other SO post you referenced.

Depending on how quickly you need a response for the ProjectB tasks, you may want to adjust your value for PREFETCH_MULTIPLIER. I believe that defaults to 4, which, from my understanding, means that your celery workers pull in items from the queue in batches of 4 (see this SO post for more info). So, if you have a lot of workers, a lot of the items in ProjectA queue may already be "reserved", even though they aren't being processed yet, and your ProjectB stuff will be in line behind all of those items that are reserved.

Community
  • 1
  • 1
Troy
  • 21,172
  • 20
  • 74
  • 103