0

Possible Duplicate:
Asynchronous HTTP calls in Python

I have a Django view which needs to retrieve search results from multiple web services, blend the results together, and render them. I've never done any multithreading in Django before. What is a modern, efficient, safe way of doing this?

I don't know anything about it yet, but gevent seems like a reasonable option. Should I use that? Does it play well with Django? Should I look elsewhere?

Community
  • 1
  • 1
Hilton Campbell
  • 6,065
  • 3
  • 47
  • 79
  • This isn't an answer (it's a comment!), but it might be worth trying to move that work onto the client side. This way you won't tie up server resources just waiting for responses, (the rest of) your page loads more quickly, and in the horrible case where one of the services is broken your page still works. – dokkaebi Sep 27 '12 at 01:19
  • If you are not very certain that this process will take less than a couple of seconds, I recommend using a task queue to do the work completely outside of the view. Then direct your users to a simple page that checks periodically via javascript until the task is shown to be complete. An example of a task queue system would be Celery/RabbitMQ – Andrew Gorcester Sep 27 '12 at 01:26
  • Great points. In my particular case, the web services are internal services (to my network) which cannot be accessed directly by the user, and which should have very low latency. – Hilton Campbell Sep 27 '12 at 01:43

3 Answers3

2

Not sure about gevent. The simplest way is to use threads[*]. Here's a simple example of how to use threads in Python:

# std lib modules. "Batteries included" FTW.
import threading
import time

thread_result = -1

def ThreadWork():
  global thread_result
  thread_result = 1 + 1
  time.sleep(5)  # phew, I'm tiered after all that addition!

my_thread = threading.Thread(target=ThreadWork)
my_thread.start()  # This will call ThreadWork in the background.
                   # In the mean time, you can do other stuff
y = 2 * 5  # Completely independent calculation.
my_thread.join()  # Wait for the thread to finish doing it's thing.
                  # This should take about 5 seconds,
                  # due to time.sleep being called
print "thread_result * y =", thread_result * y

You can start multiple threads, have each make different web service calls, and join on all of those threads. Once all those join calls have returned, the results are in, and you'll be able to blend them.

more advanced tips: You should call join with a timeout; otherwise, your users might be waiting indefinitely for your app to send them a response. Even better would be for you to make those web service calls before the request arrives at your app; otherwise, the responsiveness of your app is at the mercy of the services that you rely on.

caveat about threading in general: Be careful with data that can be accessed by two (or more) different threads. Access to the same data needs to be "synchronized". The most popular synchronization device is a lock, but there is a plethora of others. threading.Lock implements a lock. If you're not careful about synchronization, you're likely to write a "race condition" into your app. Such bugs are notoriously difficult to debug, because they cannot be reliably reproduced.

In my simple example, thread_result was shared between my_thread and the main thread. I didn't need any locks, because the main thread did not access thread_result until my_thread terminated. If I hadn't called my_thread.join, the result would some times be -10 instead of 20. Go ahead and try it yourself.

[*] Python doesn't have true threading in the sense that concurrent threads do not execute simulatneously, even if you have idle cores. However, you still get concurrent execution; when one thread is blocked, other threads can execute.

allyourcode
  • 21,871
  • 18
  • 78
  • 106
  • 1
    Thanks! I ended up using a Queue.Queue to synchronize the results of the threads. Each thread retrieves results and puts the results on the end of the shared queue. The main thread makes a blocking `get` on the queue for each thread to collect the results. – Hilton Campbell Sep 27 '12 at 13:40
2

I just nicely solved this problem using futures, available in 3.2 and backported to earlier versions including 2.x.

In my case I was retrieving results from an internal service and collating them:

def _getInfo(request,key):
    return urllib2.urlopen(
        'http://{0[SERVER_NAME]}:{0[SERVER_PORT]}'.format(request.META) +
        reverse('my.internal.view', args=(key,))
        , timeout=30)

…

    with futures.ThreadPoolExecutor(max_workers=os.sysconf('SC_NPROCESSORS_ONLN')) as executor:
        futureCalls = dict([ (
            key,executor.submit(getInfo,request,key)
        ) for key in myListOfItems ])
        curInfo = futureCalls[key]
        if curInfo.exception() is not None:
            # "exception calling for info: {0}".format(curInfo.exception())"
        else:
            # Handle the result…
MikeyB
  • 3,288
  • 1
  • 27
  • 38
1

gevent will not help you to process the task faster. It is just more efficient than threads when it comes to resource footprint. When running gevent with Django (usually via gunicorn) your web app will be able to handle more concurrent connections than a normal django wsgi app.

But: I think this has nothing to do with your problem. What you want to do is handle a huge task in one Django view, which is usually not a good idea. I personally advise you against using threads or gevents greenlets for this in Django. I see the point for standalone Python scripts or daemon's or other tools, but not for web. This mostly results in instability and more resource footprint. Instead I am agreeing with the comments of dokkaebi and Andrew Gorcester. Both comments differ somehow though, since it really depends of what your task is about.

  1. If you can split your task into many smaller tasks you could create multiple views handling these subtasks. These views could return something like JSON and can be consumed via AJAX from your frontend. Like this you can build the content of your page as it "comes in" and the user does not need to wait until the whole page is loaded.

  2. If you task is one huge chunk you are better off with a task queue handler. Celery comes in mind. If Celery is too overkill you can use zeroMQ. This basically works like mentioned above from Andrew: you schedule the task for processing and are polling the backend from your frontend page until the task is finished (usually also via AJAX). You could also use something like long polling here.

Community
  • 1
  • 1
Torsten Engelbrecht
  • 13,318
  • 4
  • 46
  • 48
  • You're right, gevent was a red herring. Thanks! I really do want to combine results for multiple service calls in one view though. I recognize that typically you should not, but for my use case I'm convinced this is the right route. – Hilton Campbell Sep 27 '12 at 13:42