I just found out about the configuration option CELERYD_PREFETCH_MULTIPLIER
(docs). The default is 4, but (I believe) I want the prefetching off or as low as possible. I set it to 1 now, which is close enough to what I'm looking for, but there's still some things I don't understand:
Why is this prefetching a good idea? I don't really see a reason for it, unless there's a lot of latency between the message queue and the workers (in my case, they are currently running on the same host and at worst might eventually run on different hosts in the same data center). The documentation only mentions the disadvantages, but fails to explain what the advantages are.
Many people seem to set this to 0, expecting to be able to turn off prefetching that way (a reasonable assumption in my opinion). However, 0 means unlimited prefetching. Why would anyone ever want unlimited prefetching, doesn't that entirely eliminate the concurrency/asynchronicity you introduced a task queue for in the first place?
Why can prefetching not be turned off? It might not be a good idea for performance to turn it off in most cases, but is there a technical reason for this not to be possible? Or is it just not implemented?
Sometimes, this option is connected to
CELERY_ACKS_LATE
. For example. Roger Hu writes «[…] often what [users] really want is to have a worker only reserve as many tasks as there are child processes. But this is not possible without enabling late acknowledgements […]» I don't understand how these two options are connected and why one is not possible without the other. Another mention of the connection can be found here. Can someone explain why the two options are connected?