11

I need to develop a realtime recent activity feed in django (with AJAX long-polling), and I'm wondering what's the best strategy for the server-side.

Pseudocode:

def recent_activity_post_save():
    notify_view()

[in the view]
while not new_activity():
    sleep(1)
return HttpResponse(new_activity())

The first thing that comes in mind is querying the DB every second. Not feasible. Other options:

  1. using the cache as a notification service
  2. using a specialized tool, like Celery (I'd rather not do it, because it seems like overkill)

What's the best way to go here?

Gabi Purcaru
  • 30,940
  • 9
  • 79
  • 95

8 Answers8

5

I would suggest keeping it simple...

Create a database table to store your events, insert into that table when appropriate, then just implement a simple ajax polling technique to hit the server every x seconds on the client side.

I have concerns with other solutions considering using a push-notification approach or using a noSql data store. It's a whole lot more complicated than a traditional pull-notification system using the tools that are built in to the Django framework, and except for very rare exceptions, is overkill. Unless you specifically require a strict real-time solution, keep it simple and use the tools that already exist in the framework, and for people with objections based on database or network performance, all I have to say is that premature optimization is the root of all evil.

Build a model that contains recent activity data specific to your application then, whenever your application does something that should log new activity you can just insert into this table.

Your view would simply be like any other view, pulling the top x rows from this RecentActivity table (optionally based on query parameters and whatever).

Then, on the client side, you'd just have a simple ajax poller hitting your view every x seconds. There is no shortage of complicated plugins and technologies you can use, but writing your own isn't that complicated either:

function simplePoll() {
  $.get("your-url", {query-parameters}, function(data){
    //do stuff with the data, replacing a div or updating json or whatever
    setTimeout(simplePoll, delay);
  });
}

My opinion is that performance issues aren't really issues until your site is successful enough for them to be an issue. A traditional relational database can scale up fairly well until you start reaching the level of success like Twitter, Google, etc. Most of us aren't at that level :)

bbak
  • 3,057
  • 2
  • 20
  • 17
DMac the Destroyer
  • 5,240
  • 6
  • 36
  • 56
2

Have you considered using Signals? You could send a signal in recent_activity_post_save() and there could be a listener which stores the information in cache.

The view would just refer to the cache to see if there are new notifications. Of course you don't need Signals, but IMHO it would be a bit cleaner that way, as you could add more "notification handlers".

This seems optimal because you don't need to poll the DB (artificial load), the notifications are "visible" almost immediately (only after the time required to process signals and interact with cache).

So the pseudocode would look like this:

# model
def recent_activity_post_save():
    post_save_signal.send()

# listener
def my_handler( ... ):
    cache.set( 'notification', .... )

post_save_signal.connect( my_handler )

# view
def my_view( request ):
    new_notification = None
    while not new_notification:
        sleep(1)
        new_notification = cache.get( 'notification' )
    return HttpResponse(...)
kgr
  • 9,750
  • 2
  • 38
  • 43
  • this is exactly what I'm using right now; it seems optimal to me too, but I was looking for other opinions on this subject. +1 – Gabi Purcaru Sep 28 '11 at 17:27
  • I liked this solution all the way until I saw the `while not new_notification` line... couldn't this theoretically hang indefinitely on a request -- presumably an ajax polling request of some sort -- while it waits for a new notification to come in? Wouldn't it be better to just return an empty data set from the view if the cache was empty? – DMac the Destroyer Sep 30 '11 at 05:42
  • @DMactheDestroyer that was some sort of pseudocode, of course it just returns an empty result after 30 seconds or so if there's no new activity. – Gabi Purcaru Sep 30 '11 at 19:09
  • Can you explain more the advantage of `cache.set( 'notification', .... )` in the signal handler (as you have) vs. directly in `recent_activity_post_save` (without needing signals)? – dkamins Sep 30 '11 at 22:04
  • @dkamins The main benefit (IMHO) of having a signal is that the code is decoupled, i.e. the recent_activity_post_save() code doesn't have to (and some might say it shouldn't) "know" about notification code. It's task is to just save the post, it may let others know about this even though by sending a signal (not knowing who listens though). So the cleaner, decoupled code would be one thing. – kgr Jan 10 '12 at 13:12
  • @dkamins Another thing is perhaps of less importance, that is if you'd like to add new functionality, you just add another signal listener (handler). This firstly is simpler than tinkering with code in recent_activity_post_save() and secondly each function or method performs a simple task and that task only, which is good from code quality point of view and a lot simpler to maintain once you get lots of code. – kgr Jan 10 '12 at 13:15
  • @DMactheDestroyer - you have a point. It was just a simple code as OP pointed out but I guess optimal would be to either develop a hybrid solution (i.e. check few times and then return empty dataset) or use your approach. The hybrid approach would have the advantage that it could be a bit lighter on the server (i.e. fewer requests) but could result in some requests being returned after the user has already left (e.g. in ajax case you pointed out). So it's matter of a choice of the person who implements, but thanks for pointing out the issue ! :) – kgr Jan 10 '12 at 13:20
1

You could use a comet solution, like the Ape project. This kind of project is designed to send real-time data to the browser, and can make use of modern browsers web sockets feature.

Thibault J
  • 4,336
  • 33
  • 44
0

You could use a trigger (fired whenever a new post is made). This trigger could write, for example, a new file in a polling directory with the necessary data in it (say the primary key). Your python could then just watch that folder for new file creations without having to touch the database until a new file appears.

ed.
  • 1,373
  • 8
  • 10
0

If you're after a comet solution then you could use orbited. Let me warn you though that because it's a rather niche solution it's very hard to find good documentation on how to deploy and use orbited in production environments.

patrys
  • 2,729
  • 17
  • 27
0

Here's a similar discussion, answering from the server-side perspective: Making moves w/ websockets and python / django ( / twisted? ) , the most important answer being this one.

There's also this answer, pointing to a very solid looking alternative to attempting this from Django.

If you really want this served from your existing Django application, don't do this server side. Holding that HTTP socket hostage to a single browser's connection is a fast way to break your application. Two reasonable alternatives are: explore the various web socket options (like the one above that uses Pyramid to host the service), or look at having the browser send a polling request periodically to the server looking for updates.

Community
  • 1
  • 1
chipchilders
  • 182
  • 1
  • 4
  • 1
    This is generally good advice, but the question did specifically say "in django (with AJAX long-polling)". – dkamins Sep 30 '11 at 22:05
0

You should decide if you would rather go with a "pull" or "push" architecture for delivering your messages, see this post on quora! If you like to go for a solution that "pushes" the notifications to their receivers caching/nosql based systems are preferrable as they don't produce such a high load for a lot of write actions.

Redis for instance with its sorted set/list datastructures offers you a lot of instance. See eg. this post (though its not python) to get an idea. You could also look into "real" message queues like RabbitMQ for example!

For the client connection the other posts here should already have given you some ideas on how to use twisted and similar frameworks.

And Celery can always be a good tool, you could eg. have all the writing to the users' activ ity streams in an asynchronous job!

Bernhard Vallant
  • 49,468
  • 20
  • 120
  • 148
  • thanks for the insight; I was looking for a _lightweight_ solution though, because there's no point in installing a NoSQL database for a single page in a site, right? – Gabi Purcaru Oct 01 '11 at 04:04
0

I don't see a need to limit yourself to the use of long-polling if that is not really necessary. There are libraries written to take advantage of best option possible (may that be short-polling, long polling, websockets or even tiny flash plugin if none of the previous options is available). Node.js has one of the best libraries out there for such a job, called Socket.IO, but lucky there is also two Python implementations available, gevent-socketio and tornadio, but later is built on top of tornado framework, so possibly out of the question.

If that suits you, you can combine them with some of the NoSQL (document) database, which is proven much faster and lightweight than relational databases. There are many many options, including CouchDB, MongoDB, Redis, ... The combination of Socket.IO and document-based DB has proven to be fast, lightweight and reliable.

Although I've seen you've already considered NoSQL in the comments, my personal opinion is, if you need a fast and easy solution, and options above suit you, this is the best chance you may take.

usoban
  • 5,428
  • 28
  • 42