11

I'm looking to get Socket.io to work multi-threaded with native load balancing ("cluster") in Node.js v.0.6.0 and later.

From what I understand, Socket.io uses Redis to store its internal data. My understanding is this: instead of spawning a new Redis instance for every worker, we want to force the workers to use the same Redis instance as the master. Thus, connection data would be shared across all workers.

Something like this in the master:

RedisInstance = new io.RedisStore;

The we must somehow pass RedisInstance to the workers and do the following:

io.set('store', RedisInstance);

Inspired by this implementation using the old, 3rd party cluster module, I have the following non-working implementation:

var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;

if (cluster.isMaster) {
  // Fork workers.
  for (var i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  var sio = require('socket.io')
  , RedisStore = sio.RedisStore
  , io = sio.listen(8080, options);

  // Somehow pass this information to the workers
  io.set('store', new RedisStore);

} else {
  // Do the work here
  io.sockets.on('connection', function (socket) {
    socket.on('chat', function (data) {
      socket.broadcast.emit('chat', data);
    })
  });
}

Thoughts? I might be going completely in the wrong direction, anybody can point to some ideas?

David Chouinard
  • 6,466
  • 8
  • 43
  • 61

1 Answers1

11

Actually your code should look like this:

var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;

if (cluster.isMaster) {
  // Fork workers.
  for (var i = 0; i < numCPUs; i++) {
    cluster.fork();
  }
} else {
  var sio = require('socket.io')
  , RedisStore = sio.RedisStore
  , io = sio.listen(8080, options);

  // Somehow pass this information to the workers
  io.set('store', new RedisStore);

  // Do the work here
  io.sockets.on('connection', function (socket) {
    socket.on('chat', function (data) {
      socket.broadcast.emit('chat', data);
    })
  });
}

Another option is to open Socket.IO to listen on multiple ports and have something like HAProxy load-balance stuff. Anyway you know the most important thing: using RedisStore to scale outside a process!

Resources:

http://nodejs.org/docs/latest/api/cluster.html
How can I scale socket.io?
How to reuse redis connection in socket.io?
Node: Scale socket.io / nowjs - scale across different instances
http://delicious.com/alessioaw/socket.io

Community
  • 1
  • 1
alessioalex
  • 62,577
  • 16
  • 155
  • 122
  • Won't that implementation create a new RedisStore for every worker? Also, what would be the advantage of using something like HAProxy? Seems like using functionality present natively in Node is better. – David Chouinard Dec 19 '11 at 15:43
  • 2
    Advantages of HAProxy: you would have processes on different ports, which you could monitor more carefully and restart when they die (using monit, upstart). Also yes you need to create a new RedisStore for every worker. Think that cluster is an implementation above child_process.fork(), so you basically copy the app N times (the processes share the same file descriptor as far as I know). – alessioalex Dec 19 '11 at 15:48
  • I just tried your code sample, seems to not be working. When single-threading, 1500 connections runs up to 50% CPU (every worker is doing a bit more work than in the sample). Using your sample, I can't get up to 500 connections before *every* process runs up to 80-90% CPU. So clearly, something is wrong since this is worsening the situation. – David Chouinard Dec 19 '11 at 16:08
  • 1
    There is a sample app with child_process.fork() here: https://github.com/dshaw/talks/tree/master/2011-10-jsclub/sample-app Can you also try that and tell me how it goes? Basically put the worker js code in app.js there and try to follow the same structure. – alessioalex Dec 19 '11 at 16:13
  • Ah, thanks for that link. Ideally, this should work over only one port (not several, as this repo does). Any suggestions? It seems we could possibly use socket-io-announce to share the Redis connection details across all workers. – David Chouinard Dec 19 '11 at 17:20
  • Or am I misunderstanding how this works? I'm not seeing code in the client the selects what port to connect to, but the server-side script starts listening on a bunch of ports. – David Chouinard Dec 19 '11 at 17:42
  • You don't share the Redis connection itself, you share Redis as a database across workers, that's the important thing (same port or multiple port). You cannot connect a worker without that complete code (new RedisStore etc). – alessioalex Dec 19 '11 at 19:28
  • Ok, thanks I get it. Still, why is the CPU usage *way* higher when using multi-threading with native load balancing? Also, I'm still not able to wrap head around that link that implementation using child_process.fork(). How is the Socket.io client selecting which port to connect? I'm not seeing code in the client the selects what port to connect to. – David Chouinard Dec 19 '11 at 20:15
  • I figured out eventually. I still have a few things unclear (might post another question), but I feel much more confident. Thanks a bunch for your time, Alessio! – David Chouinard Dec 20 '11 at 03:35