I have a celery task which uses subprocess.Popen()
to call out to an executable which does some CPU-intensive data crunching. It works well but does not take full advantage of the celery worker concurrency.
If I start up celeryd with --concurrency 8 -P prefork
, I can confirm with ps aux | grep celeryd
that 8 child processes have been spawned. OK.
Now when I start e.g. 3 tasks in parallel, I see all three tasks picked up by one of child workers:
[2014-05-08 13:41:23,839: WARNING/Worker-2] running task a...
[2014-05-08 13:41:23,839: WARNING/Worker-4] running task b...
[2014-05-08 13:41:24,661: WARNING/Worker-7] running task c...
... and they run for several minutes before completing successfully. However, when you observe the CPU usage during that time, it's clear that all three tasks are sharing the same CPU despite another free core:
If I add two more tasks, each subprocess takes ~20% of that one CPU, etc.
I would expect that each child celery process (which are created using multiprocessing.Pool
via the prefork method) would be able to operate independently and not be constrained to a single core. If not, how can I take full advantage of multiple CPU cores with a CPU-bound celery task?