On performance testing my node.js socket.io app it seems unable to handle the desired amount of concurrent websocket requests.
I am testing the application in a Docker environment with the following specs:
CPUs: 2 Ram: 4 GB
The application is stripped down to a bare minimum that only accepts websocket connections with socket.io + express.js.
I perform the tests with the help of artillery.io, the test scenario is:
config:
target: "http://127.0.0.1:5000"
phases:
- duration: 100
arrivalRate: 20
scenarios:
- engine: "socketio"
flow:
- emit:
channel: "echo"
data: "hello"
- think: 50
Report:
Summary report @ 16:54:31(+0200) 2018-07-30
Scenarios launched: 2000
Scenarios completed: 101
Requests completed: 560
RPS sent: 6.4
Request latency:
min: 0.1
max: 3
median: 0.2
p95: 0.5
p99: 1.4
Scenario counts:
0: 2000 (100%)
Codes:
0: 560
Errors:
Error: xhr poll error: 1070
timeout: 829
So I get a lot of xhr poll errors. While I monitor the CPU + mem stats the highest value for the CPU is only 43,25%. Memory will only get as high as 4%.
Even when I alter my test to an arrival rate of 20 over a timespan of 100 seconds I still get XHR poll errors.
So are these test numbers beyond the capability of nodejs + socket.io with this specs or is something else nog working as expected ? Perhaps the docker environment or the Artillery software ?
any help or suggestions would be appreciated !
side note: Already looked into nodejs clustering for scaling but like to get the most out of one process first.
Update 1
After some more testing with a websocket stresstest script found here: https://gist.github.com/redism/11283852 It seems I hit some sort of limit when I use an arrival rate higher than 50 or want to establish more connections then +/- 1900.
Until 1900 connections almost each connection gets established but after this number the XHR poll error grows exponential.
Still no high CPU or Memory values for the docker containers.
The XHR poll error in detail:
Error: xhr poll error
at XHR.Transport.onError (D:\xxx\xxx\api\node_modules\engine.io-client\lib\transport.js:64:13)
at Request.<anonymous> (D:\xxx\xxx\api\node_modules\engine.io-client\lib\transports\polling-xhr.js:128:10)
at Request.Emitter.emit (D:\xxx\xxx\api\node_modules\component-emitter\index.js:133:20)
at Request.onError (D:\xxx\xxx\api\node_modules\engine.io-client\lib\transports\polling-xhr.js:309:8)
at Timeout._onTimeout (D:\xxx\xxx\api\node_modules\engine.io-client\lib\transports\polling-xhr.js:256:18)
at ontimeout (timers.js:475:11)
at tryOnTimeout (timers.js:310:5)
at Timer.listOnTimeout (timers.js:270:5) type: 'TransportError', description: 503
Update 2
Changing the transport to "websocket" in the artillery test gives some better performance.
Testcase:
config:
target: "http://127.0.0.1:5000"
socketio:
transports: ["websocket"]
phases:
- duration: 20
arrivalRate: 200
scenarios:
- engine: "socketio"
flow:
- emit:
channel: "echo"
data: "hello"
- think: 50
Results: Arrival rate is not longer the issue but I hit some kind of limit at 2020 connections. After that it gives a "Websocket error".
So is this a limit on Windows 10 and can you change it ? Is this limit the reason why the tests with long-polling perform so badly