Long Polling/HTTP Streaming General Questions

Question

I'm trying to make a theoretical web chat application with php and jquery, I've read about long polling and http streaming, and I managed to apply most principles introduced in the articles. However, there are 2 main things I still can't get my head around.

With Long Polling

How will the server know when an update have been sent? will it need to query the databse continually or is there a better way?

With HTTP Streaming

How do I check for the results during the Ajax connection is still active? I'm aware of jQuery's success function for ajax calls, but how do I check the data while the connection is still ongoing?

I'll appreciate any and all answers, thanks in advance.

Colleges of mine work on a project that uses long polling request (several hours). After trying several techniques, they ended up using orbited. I don't know the details, just know the application is now very stable and they are happy with it. — Jacco, Aug 27 '11 at 09:23
Except I'm using PHP, although I would love to learn about it as well. — Madara's Ghost, Aug 27 '11 at 09:25
They decided to use Orbited as an intermediate service. the main part of the application is PHP, with an javascript (html,etc.etc) frontend. — Jacco, Aug 28 '11 at 06:39

coffeesnake · Accepted Answer · 2011-09-12T15:33:27.757

Yeah, the Comet-like techniques usually blowing up the brain in the beginning -- just making you think in a different way. And another problem is there are not that much resources available for PHP, cuz everyone's doing their Comet in node.js, Python, Java, etc.

I'll try to answer your questions, hope it would shed some light on this topic for people.

How will the server know when an update have been sent? will it need to query the databse continually or is there a better way?

The answer is: in the most general case you should use a message queue (MQ). RabbitMQ or the Pub/Sub functionality built into the Redis store may be a good choices, though there are many competing solutions on the market available such as ZeroMQ, Beanstalkd, etc.

So instead of continuous querying your database, you can just subscribe for an MQ-event and just hang until someone else will publish a message you subscribed for and MQ will wake you up and send a message. The chat app is a very good use case to understand this functionality.

Also I have to mention that if you would search for Comet-chat implementations in other languages, you might notice simple ones not using MQ. So how do they exchange the information then? The thing is such solutions are usually implemented as standalone single-threaded asynchronous servers, so they can store all connections in a thread local array (or something similar), handle many connections in a single loop and just pick a one and notify when needed. Such asynchronous server implementations are a modern approach that fits Comet-technique really great. However you're most likely implementing your Comet on top of mod_php or FastCGI, in this case this simple approach is not an option for you and you should use MQ.

This could still be very useful to understand how to implement a standalone asynchronous Comet-server to handle many connections in a single thread. Recent versions of PHP support Libevent and Socket Streams, so it is possible to implement such kind of server in PHP as well. There's also an example available in PHP documentation.

How do I check for the results during the Ajax connection is still active? I'm aware of jQuery's success function for ajax calls, but how do I check the data while the connection is still ongoing?

If you're doing your long-running polls with a usual Ajax technique such as plain XHR, jQuery Ajax, etc. you don't have an easy way to transmit several responses in a single Ajax request. As you mentioned you only have 'success' handler to deal with the response in whole and not with its part. As a workaround people send only a single response per request and process it in a 'success' handler, after that they just open a new long-poll request. This is just how HTTP-protocol works.

Also should be mentioned that actually there are workaround to implement streaming-like functionality using various techniques using techniques such as infinitely long page in a hidden IFRAME or using multipart HTTP-responses. Both of those methods are certain drawbacks (the former one is considered unreliable and sometimes could produce unwanted browser behavior such as infinite loading indicator and the latter one leaks consistent and straightforward cross-browser support, however certain applications still are known to successfully rely on that mechanism falling back to long-polling when the browser can't properly handle multipart responses).

If you'd like to handle multiple responses per single request/connection in a reliable way you should consider using a more advanced technology such as WebSocket which is supported by the most current browsers or on any platform that supports raw sockets (such as Flash or if you develop for a mobile app for instance).

Could you please elaborate more on message queues?

Message Queue is a term that describes a standalone (or built-in) implementation of the Observer pattern (also known as 'Publish/Subscribe' or simply PubSub). If you develop a big application, having one is very useful -- it allows you to decouple different parts of your system, implement event-driven asynchronous design and make your life much easier, especially in a heterogeneous systems. It has many applications to the real-world systems, I'll mention just couple of them:

Task queues. Let's say we're writing our own YouTube and need to convert users' video files in the background. We should obviously have a webapp with the UI to upload a movie and some fixed number of worker processes to convert the video files (maybe we would even need a number of dedicated servers where our workers only will leave). Also we would probably have to write our workers in C to ensure better performance. All we have to do is just setup a message queue server to collect and deliver video-conversion tasks from the webapp to our workers. When the worker spawns it connects to the MQ and goes idle waiting for a new tasks. When someone uploads a video file the webapp connects to the MQ and publishes a message with a new job. Powerful MQs such as RabbitMQ can equally distribute tasks among number of workers connected, keep track of what tasks had been completed, ensure nothing will get lost and will provide fail-over and even admin UI to browse current tasks pending and stats.
Asynchronous behavior. Our Comet-chat is a good example. Obviously we don't want to periodically poll our database all time (what's the use of Comet then? -- Not big difference of doing periodical Ajax-requests). We would rather need someone to notify us when a new chat-message appears. And a message queue is that someone. Let's say we're using Redis key/value store -- this is a really great tool that provides PubSub implementation among its data store features. The simplest scenario may look like following:
1. After someone enters the chat room a new Ajax long poll request is being made.
2. Request handler on the server side issues the command to Redis to subscribe a 'newmessage' channel.
3. Once someone enters a message into his chat the server-side handler publishes a message into the Redis' 'newmessage' topic.
4. Once a message is published, Redis will immediately notify all those pending handlers which subscribed to that channel before.
5. Upon notification PHP-code that keeps long-poll request open, can return the request with a new chat message, so all users will be notified. They can read new messages from the database at that moment, or the messages may be transmitted directly inside message payload.

I hope my illustration is easy to understand, however message queues is a very broad topic, so refer to the resources mentioned above for further reading.

+1 for the sheer effort :) Could you please elaborate more on message queues? It's the first time I hear about it. — Madara's Ghost, Aug 27 '11 at 12:52
np. MQs are a bit different story actually (and a long one), although I've added another paragraph with a very basic introduction to this topic. — coffeesnake, Aug 27 '11 at 15:03
I think this is the beginning of a new great community member. I'll check it out when I'm next to my computer, as now I'm on mobile. welcome to stack overflow, and keep up the great work. — Madara's Ghost, Aug 27 '11 at 17:31
Well, I'm currently learning Java to better understand the concept, I'll accept your answer, thanks a lot :) — Madara's Ghost, Aug 29 '11 at 12:39
@coffeesnake - any chance you could correct your answer based on the information I've provided below? I'll remove my answer if you do. — leggetter, Sep 08 '11 at 12:17
Down voting until "you don't have a way to transmit several responses in a single Ajax request" is corrected. Will vote up again afterwards. — leggetter, Sep 09 '11 at 17:35
@leggetter Yeah, you sure, I should rather say that there are no "reliable" ways to implement streaming-like functionality. I've added another paragraph on existing techniques, however I strongly believe that people should rather focus on modern standards such as WebSockets instead of keeping those ugly techniques alive, although this is just my personal opinion. — coffeesnake, Sep 11 '11 at 15:44
@coffeesnake HTTP Streaming can be implemented in a reliable cross-browser way and I know that many software companies are offering this within their enterprise software solutions. These solution are in use at large financial organisations. I'm not saying it isn't tricky. So, your statement "Basically you can't. If you're doing your long-running polls with a usual Ajax technique such as plain XHR, jQuery Ajax, etc. you don't have a way to transmit several responses in a single Ajax request." is incorrect. That's why I'd like your otherwise very good answer updated and then I'll upvote. — leggetter, Sep 12 '11 at 09:43

leggetter · Answer 2 · 2011-09-08T16:40:44.657

How do I check for the results during the Ajax connection is still active? I'm aware of jQuery's success function for ajax calls, but how do I check the data while the connection is still ongoing?

Actually, you can. I've provided a revised answer for the above but I don't know if it's still pending or has been ignored. Providing an update here so that the correct information is available.

If you keep the connection between the client and the server open it is possible to push updates through which are appended to the response. As each update comes in the XMLHttpRequest.onreadystatechange event is fired and the value of the XMLHttpRequest.readyState will be 3. This means that the XMLHttpRequest.responseText continues to grow.

You can see an example of this here: http://www.leggetter.co.uk/stackoverflow/7213549/

To see the JS code simply view source. The PHP code is:

<?php
$updates = $_GET['updates'];
if(!$updates) {
  $updates = 100;
}

header('Content-type: text/plain');
echo str_pad('PADDING', 2048, '|PADDING'); // initial buffer required

$sleep_time = 1;
$count = 0;
$update_suffix = 'Just keep streaming, streaming, streaming. Just keep streaming.';
while($count < 100) {
  $message = $count . ' >> ' . $update_suffix;
  echo($message);
  flush();
  $count = $count + 1;
  sleep($sleep_time);
}
?>

In Gecko based browsers such as Firefox it's possible to completely replaces the responseText by using multipart/x-mixed-replace. I've not provided an example of this.

It doesn't look like it's possible to achieve the same sort of functionality using jQuery.ajax. The success callback does not fire whenever the onreadystatechange event is fired. This is surprising since the documentation states:

No onreadystatechange mechanism is provided, however, since success, error, complete and statusCode cover all conceivable requirements.

So the documentation is potentially wrong unless I'm misinterpreting it?

You can see an example that tries to use jQuery here: http://www.leggetter.co.uk/stackoverflow/7213549/jquery.html

If you take a look at the network tab in either Firebug or Chrome Developer tools you'll see the file size of stream.php growing but the success callback still isn't fire.

Hmm, that's very peculiar, I didn't know it happens that way (I actually don't think it happens that way), could you prepare an example? (it should work in latest Firefox and Chrome). — Madara's Ghost, Sep 08 '11 at 12:19
Ok, updated my response with a link to an example. And it would appear that jQuery.ajax doesn't have an event for onreadystatechange. — leggetter, Sep 08 '11 at 16:41
As I mentioned above this way is not reliable and I would strongly encourage using something that was designed for implementation of such functionality such as WebSockets. Btw you can receive XHR object upon opening a request and check its readystate by timer to workaround that jQuery limitation. — coffeesnake, Sep 11 '11 at 15:46
As stated above, I disagree that HTTP Streaming is unreliable. It can be developed in a reliable way for the reasons stated above. The polling (`setTimeout`/`setInterval`) the XHR object is an interesting approach. I think I'd stick with using a native XHR object without jQuery. I completely agree with you that WebSockets are the way to go but people should still be aware of the correct information about the alternative solutions. — leggetter, Sep 12 '11 at 09:46
your answer is broken, it does output everything only at the end of the script. Could you fix it ? — genesis, Sep 29 '12 at 20:09
@genesis It does output all the content when the script finishes, yes. But that's not the expected functionality based on the jQuery docs. It should output the content for each `flush();` call. Because of this it doesn't look like you can do HTTP streaming using the jQuery library (this exampled used jQuery 1.6). — leggetter, Oct 01 '12 at 11:52
@leggetter I mean the first example which should grow (http://www.leggetter.co.uk/stackoverflow/7213549/) but doesn't — genesis, Oct 05 '12 at 17:00

Long Polling/HTTP Streaming General Questions

With Long Polling

With HTTP Streaming

2 Answers2

Linked