How to keep a large set of data up to date in real time using Socket.io

Question

I am writing a system with multiple user type granted different levels of access to create remove and update resources in the database. To my thinking these resources should therefore be displayed in real-time so that deleted resources are not shown and created ones are. The more timely the data the less problematic the user experience seems to become.

Using socket.io I am currently circumventing this problem by subscribing a user to the data sources their privileges require using namespaces to differentiate between the abilities of user types. I simply set up JavaScript intervals for each kind of data subscription which hammers the database every second for a fresh set of data taking no regard towards whether or not the data has actually updated I then send this back to the user application.

This is fine for testing but I have been struggling to find a solution to prevent this waste of bandwidth and reduce the database load. One idea I have had is to populate a global object with all the organisation ids as keys and in each value put an object with the name of each data subscription as keys with the value of these being the up to date set of data.

In this way I could add each new socket to a room on them joining and refactor the database queries to determine the changes to each subscription set which I could relay to the relevant organisation rooms as a series of socket.io broadcasts.

My main reservations with this implementation are the possibility for high server side memory requirements reducing scalability and the possibility of client data sets becoming out of sync if they miss any of these broadcasts. Is this solution an acceptable one for a production server or is there a simpler/better one which I am not seeing to keep large data sets real time from the client perspective?

thanks

Do you want to build a system similar to updating a scorecard in soccer match?. I'm asking because it also has large datasets with real time updating scores. — Abhishek Pankar, Jan 14 '21 at 12:27
I am unfamiliar with the size and nature of a 'scorecard' but if it is a large array of data then yes. — nrmad, Jan 14 '21 at 12:29

score 1 · Answer 1 · edited Jan 14 '21 at 15:07

Firstly, it all depends on the purpose what are you building. You can use Redis as a cache storage.

You can reduce the database calls by:

Write First in DB and update the cache then read from it
Write in the cache and perform a bulk operation after you've reached a certain point
You can use both

For real time data, you can use redis pub sub with socket io. Also you can use Apache Kafka it's a scalable, fault-tolerant, publish-subscribe messaging system that enables you to build distributed applications and powers web-scale Internet companies such as LinkedIn, Twitter, AirBnB, and many others

You can reduce the load (CPU usage) of database on bulk operation by using shards. Just make sure not to make shards on same machines.

the possibility of client data sets becoming out of sync if they miss any of these broadcasts

The don't think there are any chances of missing a broadcast because at that time the client must've disconnected for some reason and socket can only disconnect if client is not on the same page or have code issue in client side. And when client will connect again, a database will be made to get the data. So, I don't think it'll be a problem.

Checkout: Scalable architecture for socket.io

How to keep a large set of data up to date in real time using Socket.io

1 Answers1