5

Is there a way to maintain/work with a persistent connection for a POST command in rails?

I'd like to create an API where my app accepts what amounts to a stream of data from an external service (I'm writing this external service, so I can be flexible in my design here). Speed is critical. I need to get the information from the external source at a rate of 1000+ points per second. Talking with some fellow computer scientists, one came up with the idea of using a persistent connection so that the expensive TCP hand-shake would only have to be performed once. Using a library within the external service, I would then create multiple POST items that are pushed into my rails app and then process those POST items one by one.

My understanding of the rails paradigm is that each request (POST, GET, PUT, etc) takes one TCP connection. Is there a way I could utilize one TCP connection to get multiple POSTs?

I'm currently using the following:

  • Rails 3.2
  • Ruby 1.9.3 (Could switch to 2.0 if necessary)

EDIT

To help clarify what my goal is:

I have an external system that collects 1,000 data points a second (3 floating point numbers, a timestamp, and 2 integers). I'd like to push that data to my Ruby on Rails server. I'm hoping with a properly configured system I could just use the HTTP stack in real time (as a data point is collected, I push it to my rails server). I could also slow this rate of transmission down and group data points together to send them. I've looked at using messaging queues, but I'd like to see if I could write a more "standard" HTTP API before going to a specialized queue API.

Tyler DeWitt
  • 23,366
  • 38
  • 119
  • 196
  • It would be interesting to know what exactly you are transmitting via `POST`. If it can be fitted into a multipart MIME message, you might be able to stream the request, trimming down the overhead even further. – DaSourcerer Dec 18 '13 at 23:18
  • @DaSourcerer I updated my question, hopefully that gives you an idea of what I'm trying to accomplish. I've never heard of "streaming" a request. What can I google to learn about that? – Tyler DeWitt Dec 19 '13 at 14:22
  • Thanks. I must say, this is a most interesting problem. I've got some ideas that might help you. I'll update my answer as time allows it. – DaSourcerer Dec 19 '13 at 14:24
  • I've added my streaming ideas to my answer. Sorry it took so long: Quite possibly I'm not the first one to have that idea, but I couldn't find any resources for this. I'm quite confident this solution is in line with all relevant RFCs. – DaSourcerer Dec 22 '13 at 00:05

2 Answers2

7

I think the Net::HTTP::Persistent library is what you are looking for. There's also this library going one step further by implementing connection pools over persistent connections. But since it sounds like you just had one API point, this might be overkill.

Some additional thoughts: If you really look into raw speed, it might be worth to send a single multipart POST request to further reduce the overhead. This would come down to implementing a reverse server push.

For this to work, your rails app would need to accept a chunk-encoded request. This is important as we are continuously streaming data to the server without any knowledge how long the resulting message body will ultimately be. HTTP/1.1 requires all messages (that is responses and requests) to be either chunk-encoded or have their body size specified by a Content-Length header (cf RFC 2616, section 4.4). However, most clients prefer latter option which results into some webservers not handling chunk-encoded requests well (e.g. nginx hasn't had this implemented before v1.3.9).

As a serialization format, I can safely recommend JSON, which is really fast to generate and widely accepted. An implementation for RoR can be found here. You might want to have a look at this implementation as well as it is natively working with streams and might thus be better suitable. If you find that JSON doesn't suit your needs, give MessagePack a try.

If you hit network saturation, it could be worth to investigate the possibilities for request compression.

Everything put together, your request could look like this (compression and chunk-encoding stripped for the sake of legibility):

POST /api/endpoint HTTP/1.1
Host: example.com
Content-Type: multipart/mixed; boundary="---boundary-"
Transfer-Encoding: chunked
Content-Encoding: deflate

---boundary-
Content-Type: application/json

{...}
---boundary-
Content-Type: application/json

{...}
---boundary---

The mime type is multipart/mixed as I felt it were the most appropriate one. It actually implies the message parts were of different content types. But as far as I can see, this is nowhere enforced, so multipart/mixed is safe to use here. deflate is chosen over gzip as compression method as it doesn't need to generate a CRC32 checksum. This allows for a speed boost (and saves a few bytes).

Community
  • 1
  • 1
DaSourcerer
  • 6,288
  • 5
  • 32
  • 55
  • Thanks for the very detailed response. I'm not that familiar with the HTTP stack. Is it the client that requests a persistent (chunked) connection or is there some configuration I need to do on the server as well? – Tyler DeWitt Dec 23 '13 at 15:53
  • It's the client requesting it and it's the server needing to support it. Sorry for the short answer, I'm in a rural area and got no stable connection. – DaSourcerer Dec 24 '13 at 09:57
  • Do you know if the configuration is going to be just in the webserver (I use nginx) or are there some settings I need to check in Rails as well? Thanks for all the help so far, btw. It's been super useful. – Tyler DeWitt Dec 24 '13 at 19:40
  • Request compression will have to be configured in the server (if it supports it. I'm not that sure nginx does so, tbh). Chunked-encoding requires a version of nginx that is recent enough (see above). If nginx doesn't happen to support request compression, you could try to compress the individual message parts and let RoR handle the decompression. Should be fine with both, nginx and all relevant RFCs. – DaSourcerer Dec 25 '13 at 09:50
0

I know you want a HTTP solution, but honestly if speed is critical, I would take HTTP out of the equation. Web sockets seem to adapt to this problem much better.

See an example app from Heroku: https://devcenter.heroku.com/articles/ruby-websockets

And in general see Twitter stream API for an inspiration: https://dev.twitter.com/docs/streaming-apis

On top of that, you could transfer binary data instead of text, speeding up the transfer further and then have workers that ingest and save the data.

Just my 2cents

aledalgrande
  • 5,167
  • 3
  • 37
  • 65
  • I think I'm looking for something like Twitter's streaming API, but isn't that handled over HTTP? – Tyler DeWitt Dec 24 '13 at 14:53
  • Actually yes, but the persistent connection has to be configured by who starts it. So if you want other people to open a connection to inject data into your server, they have to set up a persistent connection. With sockets you just have to worry about processing data. – aledalgrande Dec 24 '13 at 19:20
  • I think I'm close to understanding what you are suggesting. Correct me if I'm wrong: I would create a WebSocket enabled service that people would connect to. They would then push information to this WebSocket service, and this WebSocket service would push information to my backend? – Tyler DeWitt Dec 24 '13 at 19:38
  • Yes, exactly. The backend should process asynchronously. – aledalgrande Dec 24 '13 at 23:28