I'm writing a socket.io based server in Node.js (6.9.0). I am using the builtin cluster
module to enable multiple processes. For now, there is only two process: a master and a worker. The master receives the connections and maintains an in-memory global data structure (which the worker can query via IPC). The worker process does the majority of work by handling each incoming connection.
I am finding a hanging condition that I cannot attribute to any internal failure when the server is stressed at 300 concurrent users. Under lower concurrency, I don't see the hanging condition.
I'm enabling all forms of debugging (using the debug
module: socket.io:socket
, socket.io:client
as well as my own custom calls to debug
).
The last activity I can see is in socket.io
, however, the messages indicate that sockets are closing ("reason client namespace disconnect") due to their own "end of test" cycle. It just seems like incoming connections are not be serviced.
I'm using Artillery.io as the test client.
In the server application, I have handlers for uncaught exceptions and try-catch
blocks around everything.
In a prior iteration, I also used cluster
, but reversed the responsibilities so that the master process handled the connections (with the worker handling global data). That didn't exhibit the same failure. Not sure if something is wrong with the connection distribution. For that, I have also dumped internalMessage
events to monitor the internal workings of cluster
.
I am not using any other module for connection distribution or sticky sessions. As there is only a single process handling connections (at this time), it doesn't seem relevant.