7

I have a desktop app that I'm in the process of porting to a Django webapp. The app has some quite computationally intensive parts (using numpy, scipy and pandas, among other libraries). Obviously importing the computationally intensive code into the webapp and running it isn't a great idea, as this will force the client to wait for a response.

Therefore, you'd have to farm these tasks out to a background process that notifies the client (via AJAX, I guess) and/or stores the results in the database when it's complete.

You also don't want all these tasks running in simultaneously in the case of multiple concurrent users, since that is a great way to bring your server to its knees even with a small number of concurrent requests. Ideally, you want each instance of your webapp to put its tasks into a job queue, that then automagically runs them in an optimal way (based on number of cores, available memory, etc.).

Are there any good Python libraries to help resolve this sort of an issue? Are there general strategies that people use in these kinds of situations? Or is this just a matter of choosing a good batch scheduler and spawning a new Python interpreter for each process?

Chinmay Kanchi
  • 62,729
  • 22
  • 87
  • 114
  • 1
    This is a _recommend or find a book, tool, software library, tutorial or other off-site resource_ question, but take a look at [celery](http://www.celeryproject.org/) – Maciej Gol Sep 09 '14 at 17:50

1 Answers1

10

We developed a Django web app which does heavy computation(Each process will take 11 to 88 hours to complete on high end servers).

Celery: Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.

Celery offers

  • Run tasks asynchronously.
  • Distributed execution of expensive processes.
  • Periodic and/or scheduled tasks.
  • Retrying tasks if something goes wrong.

This is just the tip of iceberg. There are a hell lot of features celery offers. Take a look at documentation & FAQ.

You also need to design a very good canvas for workflow. For example, you don't want all tasks running simultaneously in the case of multiple concurrent users, since it is a resource consumption. Also you might want to schedule tasks based on users who are currently online.

Also you need very good database design, efficient algorithms and so on.

Chillar Anand
  • 27,936
  • 9
  • 119
  • 136