7

I am working on a small but computationally-intensive Python app. The computationally-intensive work can be broken into several pieces that can be executed concurrently. I am trying to identify a suitable stack to accomplish this.

Currently I am planning to use a Flask app on Apache2+WSGI with Celery for the task queue.

In the following, will a_long_process(), another_long_process() and yet_another_long_process() execute concurrently if there are 3 or more workers available? Will the Flask app be blocked while the processes are executing?

from the Flask app:

@myapp.route('/foo')
def bar():
    task_1 = a_long_process.delay(x, y)
    task_1_result = task_1.get(timeout=1)
    task_2 = another_long_process.delay(x, y)
    task_2_result = task_2.get(timeout=1)
    task_3 = yet_another_long_process.delay(x, y)
    task_3_result = task_3.get(timeout=1)
    return task_1 + task_2 + task_3

tasks.py:

from celery import Celery
celery = Celery('tasks', broker="amqp://guest@localhost//", backend="amqp://")
@celery.task
def a_long_process(x, y):
    return something
@celery.task
def another_long_process(x, y):
    return something_else
@celery.task
def yet_another_long_process(x, y):
    return a_third_thing
gavinmh
  • 257
  • 4
  • 9

3 Answers3

7

You should change your code so the workers can work in parallel:

@myapp.route('/foo')
def bar():
    # start tasks
    task_1 = a_long_process.delay(x, y)
    task_2 = another_long_process.delay(x, y)
    task_3 = yet_another_long_process.delay(x, y)
    # fetch results
    try:
        task_1_result = task_1.get(timeout=1)
        task_2_result = task_2.get(timeout=1)
        task_3_result = task_3.get(timeout=1)
    except TimeoutError:
        # Handle this or don't specify a timeout.
        raise
    # combine results
    return task_1 + task_2 + task_3

This code will block until all results are available (or the timeout is reached).

Will the Flask app be blocked while the processes are executing?

This code will only block one worker of your WSGI container. Wether the entire site is unresponsive depends on the WSGI container you are using. (e.g. Apache + mod_wsgi, uWSGI, gunicorn, etc.) Most WSGI containers spawn multiple workers so only one worker will be blocked while your code waits for the task results.

For this kind of application I would recommend using gevent which spawns a separate greenlet for every request and is very lightweight.

bikeshedder
  • 7,337
  • 1
  • 23
  • 29
  • setting timeout to 1 seems really hacky but seems like this is the only way to avoid blocking? right now my browser seems wait forever until the background task completes. +1 – KJW Apr 08 '13 at 20:11
1

According to the documentation for result.get(), it waits until the result is ready before returning, so normally it is in fact blocking. However, since you have timeout=1, the call to get() will raise a TimeoutError if the task takes longer than 1 second to complete.

By default, Celery workers start with a concurrency level set equal to the number of CPUs available. The concurrency level seems to determine the number of threads that can be used to process tasks. So, with a concurrency level >= 3, it seems like the Celery worker should be able to process that many tasks concurrently (perhaps someone with greater Celery expertise can verify this?).

voithos
  • 68,482
  • 12
  • 101
  • 116
0

Use the Group feature of celery canvas:

The group primitive is a signature that takes a list of tasks that should be applied in parallel.

Here is the example provided in the documentation:

from celery import group
from proj.tasks import add

g = group(add.s(2, 2), add.s(4, 4))
res = g()
res.get()

Which outputs [4, 8].

Jon
  • 9,815
  • 9
  • 46
  • 67
  • WHY to use Celery when you also use get? In this case this will be the same of calling the function without celery, I guess.... What I'm missing? – Gustavo Vargas Oct 23 '14 at 20:31
  • I don't fully understand your question, but here goes: the documentation states that `A group is lazy so you must call it to take action and evaluate the group.`. That's what `res = g()` is doing - it calls the group so that the contained tasks run (in parallel). `res` is a `GroupResult` and `get` returns the results of each contained task (the docs have nice examples of this). – Jon Oct 25 '14 at 12:29
  • maybe I'm wrong but when you call get it will block until it have a response to return. Is that right? If it is, why not to simply call the function without using celery? What's the advantage of using celery in this case? – Gustavo Vargas Oct 26 '14 at 16:33
  • The advantage is that the group will run all the contained tasks in parallel. – Jon Oct 27 '14 at 07:19