socket.io performance one emit per database row

Question

I am trying to understand what is the best way to read and send a huge amount of database rows (50K-100K) to the client.

Should I simply read all the rows at once from the database at the backend and then send all the rows in a json format? This isn't that much responsive as user is just waiting for a long time, but this is faster for small no. of rows.
Should I stream the rows from the database and upon each reading of the row from the database, I call a socket.emit()? This causes too many socket emits, but is more responsive, but slow...

I am using node.js, socket.io

Why do you need to send so many at once? Have you looked at server-side pagination? — Ben Fortune, May 15 '15 at 10:08
Once you figure out how to display 100k rows of data in a "responsive" manner, you could send paginated data as @BenFortune suggested, for example select first 50 results and send them to the client and when they want to view more, repeat for next 50 results. — Marko Gresak, May 15 '15 at 10:27
i need all the data at once because i am giving the user that option. i have looked into server-side pagination, but it won't work in my case as i need to process 100K rows (display them in a grouped manner) at the frontend — amulllb, May 15 '15 at 16:37

score 8 · Accepted Answer · answered May 15 '15 at 18:30

Rethink the Interface

First off, a user interface design that shows 50-100k rows on a client is probably not the best user interface in the first place. Not only is that a large amount of data to send down to the client and for the client to manage and is perhaps impractical in some mobile devices, but it's obviously way more rows than any single user is going to actually read in any given interaction with the page. So, the first order might be to rethink the user interface design and create some sort of more demand-driven interface (paged, virtual scroll, keyed by letter, etc...). There are lots of different possibilities for a different (and hopefully better) user interface design that lessens the data transfer amount. Which design would be best depends entirely upon the data and the likely usage models by the user.

Send Data in Chunks

That said, if you were going to transfer that much data to the client, then you're probably going to want to send it in chunks (groups of rows at a time). The idea with chunks is that you send a consumable amount of data in one chunk such that the client can parse it, process it, show the results and then be ready for the next chunk. The client can stay active the whole time since it has cycles available between chunks to process other user events. But, sending it in chunks reduces the overhead of sending a separate message for each single row. If your server is using compression, then chunks gives a greater chance for compression efficiency too. How big a chunk should be (e.g. how many rows of data is should contain) depends upon a bunch of factors and is likely best determined through experimentation with likely clients or the lowest power expected client. For example, you might want to send 100 rows per message.

Use an Efficient Transfer Format for the Data

And, if you're using socket.io to transfer large amounts of data, you may want to revisit how you use the JSON format. For example, sending 100,000 objects that all repeat exactly the same property names is not very efficient. You can often invent your own optimizations that avoid repeating property names that are exactly the same in every object. For example, rather than sending 100,000 of these:

 {"firstname": "John", "lastname": "Bundy", "state": "Az", "country": "US"}

if every single object has the exact same properties, then you can either code the property names into your own code or send the property names once and then just send a comma separated list of values in an array that the receiving code can put into an object with the appropriate property names:

 ["John", "Bundy", "Az", "US"]

Data size can sometimes be reduced by 2-3x by simply removing redundant information.

Is this stuff not zipped though? Like, zipping will remove all the size associated with repeated keys — Rambatino, Jan 16 '17 at 22:51
@Rambatino - socket.io out of the box does not do compression (it's just a layer over webSockets). There are npm modules that can add compression. I do not have any experience with them. It's hard to beat just no sending the redundant data in the first place, but there are compression options out there if that's your preference. The real place that compression belongs is at the webSocket layer and there are proposals for it, but nothing I saw standardized. See this [How does WebSocket compress messages](http://stackoverflow.com/questions/19298651/how-does-websocket-compress-messages). — jfriend00, Jan 17 '17 at 00:38
Thanks mate! I guess it ultimately a trade-off between zip compress and decompress in comparison to the time it would take without zipping. Which for small data (which sockets should handle) means that it's unlikely to be _worth_ zipping — Rambatino, Jan 17 '17 at 13:34

socket.io performance one emit per database row

1 Answers1

Linked