Celery chain performances

Question

I wonder why celery chain is so slow comparing to an ad hoc solution.

In the ad hoc solution I forward the task manually, the drawback is I cannot wait for the end of the chain.

In the following code, the canvas solution takes 16 seconds and the ad hoc takes 3 seconds. Wonder if other canvas stuff are also slow comparing to naive solutions.

import sys
from celery import Celery, chain
from celery.task import task
from datetime import datetime

broker = "amqp://admin:admin@172.16.1.30:5672/tasks"
backend = 'redis://:redis@172.16.1.30:6379/1'

app = Celery(
    "celery-bench",
    broker=broker,
    backend=backend
)

app.conf.accept_content = ['json']
app.conf.task_serializer = 'json'
app.conf.result_serializer = 'json'

@task(name="result", queue="bench-results")
def result(result):
    return result

@task(name="simple-task-auto-chain", queue="bench-tasks")
def simple_task_auto_chain(date, arg):
    if arg >= 0:
        simple_task_auto_chain.delay(date, arg-1)
        return arg
    else:
        return result.delay(
            "AutoChain %s"%(str(datetime.now() - datetime.fromisoformat(date)))
        )

@task(name="simple-task", queue="bench-tasks")
def simple_task(args):
    date, arg = args
    if arg >= 0:
        return (date, arg - 1)
    else:
        return result.s(
            "CanvasChain %s"%(str(datetime.now() - datetime.fromisoformat(date)))
        ).delay()

def bench_auto_chain(n=1000):
    now = datetime.now()
    simple_task_auto_chain.delay(now, n)

def bench_canvas_chain(n=1000):
    now = datetime.now()
    chain(
        simple_task.s((now, n)),
        *[simple_task.s()] * (n + 1),
    ).delay()

# celery -A benchs-chain worker -l info --concurrency 1 --queues bench-results
# celery -A benchs-chain worker -l info --concurrency 1 --queues bench-tasks
# ./benchs-chain.py auto (~3s)
# ./benchs-chain.py canvas (~16s)
if __name__=='__main__':
    if len(sys.argv) > 1:
        if 'canvas' in sys.argv:
            bench_canvas_chain()
        if 'auto' in sys.argv:
            bench_auto_chain()

Edit: I think we got something like this, this is why canvas chain has bad performances.

Synchronisation. Result of a task in the chain becomes the first argument to the next task in the chain, etc... This means you can't execute them like Group for an example. — DejanLekic, Feb 07 '20 at 13:24

score 5 · Answer 1 · answered Feb 12 '20 at 21:41

Yes, you are right. Your method will be faster for this case.

Quote from Celery documentation:

The synchronization step is costly, so you should avoid using chords as much as possible. Still, the chord is a powerful primitive to have in your toolbox as synchronization is a required step for many parallel algorithms.

Chain also has a lot more functionality than auto-chain, like:

collecting results of each task
allows even to build a graph of calls
encapsulation of sub-task managing outside the task itself

As you could see half of the time it takes to create the chain (~18 sec).
Under the hood chain uses chord. And they both consume more memory and have many preparation steps to run as you described in question.

When you call the next task, from parent task - you create a single task which doesn't know what will be next, at the end or few steps before. Another thing that for longer tasks you'll not feel that time difference. And finally you loose a lot of information, which probably you don't need in this simple scenario.

Where did you get that information that chain uses chord under the hood? - I always thought it is vice-versa: chord = chain(group, final-task) — DejanLekic, Feb 13 '20 at 10:49
@DejanLekic you can find it here in `prepare_steps` method of `_chain` class in `celery/canvas.py` file: https://github.com/celery/celery/blob/a537c2d290f42e112b16937055dca0e90f8341c3/celery/canvas.py#L739 — wowkin2, Feb 14 '20 at 09:09

Celery chain performances

1 Answers1