The code
I have a clustered Node application that listens to TCP traffic and parses binary data into JSON format.
But here's the catch: all incoming traffic comes across a single persistent connection.
As I understand it, cluster will balance load on a single port by distributing new sockets across workers, but there is no native way to distribute the load of a single socket across workers.
In order to do so, I've set up the cluster master to accept the incoming connection and segment the messages. It then explicitly passes messages to cluster workers in round robin fashion. When the stream responsible for segmenting messages emits a new message, it simply uses the cluster messaging API to send
the message to the next Worker/parser in line:
// (cluster.isMaster === true)
var gateway = new Gateway(config.gateway.port, config.gateway.host);
var nextWorker = 1;
gateway.on('message', function roundRobin (msg) {
var workers = cluster.workers;
var numWorkers = Object.keys(workers).length;
workers[nextWorker].send(msg);
if (++nextWorker > numWorkers) {
nextWorker = 1; // else, it's prefix incremented
}
});
for (w in cluster.workers) {
cluster.workers[w].on('message', gateway.respond.bind(gateway));
}
The Workers parse a message, use it to make an HTTP request, and then, using the cluster send
API to respond back to the gateway
(last block of code above).
The problem
I am getting strange and unpredictable latency patterns when placing the system under load. All CPU/memory/network measurements are sane and don't indicate an infrastructure bottleneck.
The question
As you can see, work is distributed among workers equally, without respect to the actual throughput of a given worker. My hunch is that this is what is causing the latency spikes–somewhere, perhaps, an individual worker is getting backed up.
Is there any way to confirm this, in principle, or empirically? Perhaps it's just wishful thinking, but it seems that the approach should just average out and not need a worker-pull type algorithm. (Which seems especially tricky, as I can't reason which would be the best time to consider a worker to be free – after it's done parsing? after it's received an HTTP response? after it has sent its response to the gateway?)
I just don't know enough about CPU scheduling to know whether I'm chasing a red herring or if this is a poor algorithm which is definitely causing troubles. (And if so, any ideas on how to improve it would be appreciated.)