Twitter's method for streaming API on w3c

Question

I am interested in building a streaming API (read-only) similar to what Twitter has built. Data will only be going unidirectional, from server to client. Clients do not have to be web browsers but merely anything that can technically keep a persistent HTTP connection open. I'm fairly certain what Twitter's streaming API is doing is not WebSockets and not COMET. I was wondering if the technology/strategy that they deployed is one with a w3c specification that one can study. I don't necessarily see any links to their strategy on W3C - so it might be something "custom" but any point in the right direction to understanding the buzzwords and protocols involved to building this server side HTTP streaming support would be great.

Did you get your question answered? – wanghq Oct 19 '13 at 19:28 — wanghq, Oct 19 '13 at 19:28

score 3 · Answer 1 · answered Oct 13 '13 at 15:24

3

Twitter's implementation uses a custom protocol, but it's similar in spirit to the w3c-standard Server-Sent Events. Server-sent events are much simpler than websockets, but only allow communication in one direction. There is a python implementation of the server side of the protocol in this pull request for Tornado.

answered Oct 13 '13 at 15:24

Ben Darnell

21,844
3
29
50

I also decided to use server-side event for our streaming APIs like market data streams, push notifications, all that happens in server to client events. – Karl Anthony Baluyot Jan 30 '20 at 02:26

score 1 · Answer 2 · answered Oct 13 '13 at 06:36

Based on this slide, twitter streaming API uses Jetty server. So does a plain blocking IO work? Basically client makes a request telling what tweets it interests, the server responds but doesn't close the response. Every time there are new tweets coming in, the server gets notified and writes(and flushes) the data back to client, but again not close the response.

From notes of Page 20:

How do the servers work internally? Hosebird runs on the JVM. It's written in Scala. And uses an embedded Jetty webserver to handle the front end issues. We feed each process 8 cores and about 12 gigs of memory. And they each can send a lot of data to many many of clients.

Disclaimer: I am not familiar with this topic so I might be totally wrong. What I said is based on my feeling. It's an interesting topic.

score 0 · Answer 3 · answered Oct 08 '13 at 02:08

0

You may be looking for a publish / subscribe service. Some good information on this is http://en.wikipedia.org/wiki/Publish/subscribe. You can make the service read only and discard messages from the client that don't connect to valid channels.

An implementation could be done with redis http://redis.io/topics/pubsub and a small application to connect to the proper channels.

Other implementations could be done with RabbitMQ http://www.rabbitmq.com/tutorials/tutorial-three-python.html I am sure there are other implementations but I am not privy to them at this time.

Here is a w3c link to pub/sub http://www.w3.org/community/pubsub/

answered Oct 08 '13 at 02:08

mrfunyon

99
1
4

The HTTP transport part of this is what I'm interested in more information on. – randombits Oct 08 '13 at 17:43
The httppart is just the type of connection you are keeping alive. There isn't much to it. You might want to look into the websock implementation http://en.m.wikipedia.org/wiki/WebSocket – mrfunyon Oct 09 '13 at 01:31
Please re-read the question. I wrote that I'm fairly certain Twitter is not using websockets. – randombits Oct 09 '13 at 01:47

Twitter's method for streaming API on w3c

3 Answers3

Linked