Is it possible (and a good idea) to have a single Node server listening to several real time data streams

Question

Hi all! After years using Stackoverflow, I am posting my first question. Here it is!

I am creating an application that performs social media analytics for several customers. Everything was easy when my NodeJS app was only dealing with a single customer, because I could use Socket.io to connect to Twitter and Instagram real time APIs, do some calculations and return data to the browser (also using Socket.io).

Now I need to scale this solution, and I was wondering if it's feasible to refactor the App, so on startup, it loads the configuration for all the customers using it (social media account details, basically), and use threads (or clusters) to have dedicated processes for each customer.

Each client would open a socket to the server, with an identifier, and they would get their data from that specific thread/cluster. Also, some processes can be running for days (i.e.: analysing a marketing campaign impact on Twitter for a week). So I definitely can't block another customer's processes. And I could be opening real time streams from Twitter using different api keys, searching for different things.

I bet it must be quite complex to achieve this, specially when the number of customers grows. If there was no real time data, I could just have a Rest API and a single process serving all requests. But the main feature I would like to have is true real-time data, to display it in interactive dashboards.

I've seen this: node-cluster-socket.i0 It seems interesting, but since I am quite new to NodeJS, maybe that approach is not quite what I need.

Any advise, suggestion or solution when implementing a similar solution using NodeJS?

score 0 · Answer 1 · edited May 23 '17 at 12:14

A single node process should be able to handle your use case fine.

Node can easily handle having many (hundreds or thousands) of open connections.

To create a proof of concept is actually not very complicated at all:

Node could store an object of all open connections globally.

Each client that would connect could have their socket stored:

var allConnections = {};

io.on('connection', function (socket) {     
  socket.on('setup', function (data) {
    allConnections[data.clientId] = {
        ws: socket
    }
    // get client credentials for their real time streams
    allConnections[data.clientId].twitterStream = getTwitterStream(clientCredentials)
  });
});

since all io is done asyncronously, by default, in node creating connections will not block.

You could store references to the clients open streams under that clients key, Suppose for twitter using Twit library for streaming:

    function getTwitterStream(clientCredentials) {
      // get client from credentials

      var twitStream = client.stream('statuses/filter', {track: 'javascript'});

      twitStream.on('data', function(tweet) {
        console.log(tweet.text);
      });

      twitStream.on('error', function(error) {
        throw error;

      });

  return twitStream;
}

Doing this the node process has 2 connections open for the given client, one for their websocket and one for their twitter streaming API. A single node process should be able to handle hundreds or thousands or tens of thousands open connections (ymmv)

https://bocoup.com/weblog/node-stress-test-analysis/ node.js server with socket.io handling 50000 simultaneous clients

Wow thanks! I will try this right away. It makes sense to have a collection of streams, I had a more complex idea of how things needed to be done. But as usual, keeping it simple is the best approach. — Luis Serrano, Sep 23 '15 at 14:53

Is it possible (and a good idea) to have a single Node server listening to several real time data streams

1 Answers1