9

I'm using Celery to process multiple data-mining tasks. One of these tasks connects to a remote service which allows a maximum of 10 simultaneous connections per user (or in other words, it CAN exceed 10 connections globally but it CANNOT exceed 10 connections per individual job).

I THINK Token Bucket (rate limiting) is what I'm looking for, but I can't seem to find any implementation of it.

NFicano
  • 1,065
  • 1
  • 11
  • 26

3 Answers3

11

Celery features rate limiting, and contains a generic token bucket implementation.

Set rate limits for tasks: http://docs.celeryproject.org/en/latest/userguide/tasks.html#Task.rate_limit

Or at runtime:

http://docs.celeryproject.org/en/latest/userguide/workers.html#rate-limits

The token bucket implementation is in Kombu

orokusaki
  • 55,146
  • 59
  • 179
  • 257
asksol
  • 19,129
  • 5
  • 61
  • 68
  • Alas, this doesn't work properly because it's per queue. I came up with a better solution here: https://stackoverflow.com/a/66161773/64911 – mlissner Feb 11 '21 at 19:55
3

Although it might be bad practice, you could use a dedicated queue and limit the worker, like:

    # ./manage.py celery worker -Q another_queue -c 10
André B.
  • 669
  • 5
  • 8
  • I don't think is bad practice. It's a queue with a different usage and consuming pattern so... to me it makes sense if it's a different queue. – jjmontes Mar 05 '21 at 11:47
3

After much research I found out that Celery does not explicitly provide a way to limit the number of concurrent instances like this and furthermore, doing so would generally be considered bad practice.

The better solution would be to download concurrently within a single task, and use Redis or Memcached to store and distribute for other tasks to process.

NFicano
  • 1,065
  • 1
  • 11
  • 26