4

I wonder why celery chain is so slow comparing to an ad hoc solution.

In the ad hoc solution I forward the task manually, the drawback is I cannot wait for the end of the chain.

In the following code, the canvas solution takes 16 seconds and the ad hoc takes 3 seconds. Wonder if other canvas stuff are also slow comparing to naive solutions.

import sys
from celery import Celery, chain
from celery.task import task
from datetime import datetime

broker = "amqp://admin:admin@172.16.1.30:5672/tasks"
backend = 'redis://:redis@172.16.1.30:6379/1'

app = Celery(
    "celery-bench",
    broker=broker,
    backend=backend
)

app.conf.accept_content = ['json']
app.conf.task_serializer = 'json'
app.conf.result_serializer = 'json'

@task(name="result", queue="bench-results")
def result(result):
    return result

@task(name="simple-task-auto-chain", queue="bench-tasks")
def simple_task_auto_chain(date, arg):
    if arg >= 0:
        simple_task_auto_chain.delay(date, arg-1)
        return arg
    else:
        return result.delay(
            "AutoChain %s"%(str(datetime.now() - datetime.fromisoformat(date)))
        )

@task(name="simple-task", queue="bench-tasks")
def simple_task(args):
    date, arg = args
    if arg >= 0:
        return (date, arg - 1)
    else:
        return result.s(
            "CanvasChain %s"%(str(datetime.now() - datetime.fromisoformat(date)))
        ).delay()

def bench_auto_chain(n=1000):
    now = datetime.now()
    simple_task_auto_chain.delay(now, n)

def bench_canvas_chain(n=1000):
    now = datetime.now()
    chain(
        simple_task.s((now, n)),
        *[simple_task.s()] * (n + 1),
    ).delay()

# celery -A benchs-chain worker -l info --concurrency 1 --queues bench-results
# celery -A benchs-chain worker -l info --concurrency 1 --queues bench-tasks
# ./benchs-chain.py auto (~3s)
# ./benchs-chain.py canvas (~16s)
if __name__=='__main__':
    if len(sys.argv) > 1:
        if 'canvas' in sys.argv:
            bench_canvas_chain()
        if 'auto' in sys.argv:
            bench_auto_chain()

Edit: I think we got something like this, this is why canvas chain has bad performances. enter image description here

wowkin2
  • 5,895
  • 5
  • 23
  • 66
ptitpoulpe
  • 684
  • 4
  • 17
  • Synchronisation. Result of a task in the chain becomes the first argument to the next task in the chain, etc... This means you can't execute them like Group for an example. – DejanLekic Feb 07 '20 at 13:24
  • I my test, I also forward data from task to task – ptitpoulpe Feb 07 '20 at 15:41

1 Answers1

5

Yes, you are right. Your method will be faster for this case.

Quote from Celery documentation:

The synchronization step is costly, so you should avoid using chords as much as possible. Still, the chord is a powerful primitive to have in your toolbox as synchronization is a required step for many parallel algorithms.

Chain also has a lot more functionality than auto-chain, like:

  • collecting results of each task
  • allows even to build a graph of calls
  • encapsulation of sub-task managing outside the task itself

As you could see half of the time it takes to create the chain (~18 sec).
Under the hood chain uses chord. And they both consume more memory and have many preparation steps to run as you described in question.

When you call the next task, from parent task - you create a single task which doesn't know what will be next, at the end or few steps before. Another thing that for longer tasks you'll not feel that time difference. And finally you loose a lot of information, which probably you don't need in this simple scenario.

wowkin2
  • 5,895
  • 5
  • 23
  • 66
  • 1
    Where did you get that information that chain uses chord under the hood? - I always thought it is vice-versa: chord = chain(group, final-task) – DejanLekic Feb 13 '20 at 10:49
  • 2
    @DejanLekic you can find it here in `prepare_steps` method of `_chain` class in `celery/canvas.py` file: https://github.com/celery/celery/blob/a537c2d290f42e112b16937055dca0e90f8341c3/celery/canvas.py#L739 – wowkin2 Feb 14 '20 at 09:09
  • Thanks a lot for this enlightenment! :) – DejanLekic Feb 14 '20 at 13:05