1

I have 4 ec2 instances running on AWS. PM2 is running in cluster mode on all instances. When I get 5K+ Concurrent request, response time of app increases significantly.

All requests fetch redis key, and a normal fetch takes upto 10 seconds which without so many concurrent requests takes only 50ms. What can be issue here?

Swapnil Saurav
  • 196
  • 2
  • 11

1 Answers1

1

We need to pinpoint the bottleneck. Let's do some diagnostics:

  1. Are the EC2 instances multicore to take advantage of PM2's clustering?

  2. When you execute pm2 start app.js -i X are you sure X=number_of_vCPUs of EC2 instance?

  3. When you execute pm2 monit do you see all instances of the cluster sharing the equal CPU and memory usage?

  4. When you run htop what is your total CPU and memory usage %?

  5. When you execute iftop what is your total of your RX and TX traffic compared to the maximum available in your machine?

maninak
  • 2,633
  • 2
  • 18
  • 33
  • 1) Yes. Its multi core. c4.8x.large with 16 vCPU 2) Yes. 3) Yes. All share equal 4) CPU usage varies between 0 to 50. On avg around 24% 5) RX Peak - 185, TX Peak - 181. Total PEAK- 170. RX CUM - 189, TX CUM - 815, TOTAL CUM - 1.5 GB – Swapnil Saurav Jun 06 '17 at 16:08
  • Just now saw, CPU usage increases to 100% for some pm2 processe for a second and then reduces to 24 – Swapnil Saurav Jun 06 '17 at 16:13
  • I am running ec2 on CentOS – Swapnil Saurav Jun 06 '17 at 16:16
  • This looks like an I/O bottleneck at first glance.. Not sure how else I would proceed to verify. I suppose all your requests ask for the same key to eliminate variables. I would probably try again load-testing while returning a much smaller value for the same key and again returning a much larger value. About 2 orders of magnitude should suffice to prove if it's an I/O bottleneck. – maninak Jun 06 '17 at 16:21
  • Suppose its I/O bottleneck how can i fix that? – Swapnil Saurav Jun 06 '17 at 16:25
  • 2
    I have socket io enabled and when a message is emitted, all connected clients fetch data at that moment. Thats when the system becomes slow. Otherwise everything works fine. – Swapnil Saurav Jun 06 '17 at 16:27
  • Ah, this should have been in the OP, this is important and possibly not I/O then. Besides your network I/O throughput is already insane big. Also, you mentioned your EC2 instance is C4.8xlarge but said it had 16 vCPUs. I just checked and it has 32! Right of the bat, doing `pm2 start app.js -i 0` instead of 16 should help a bunch (starts as big cluster as needed). I would also play around with 64 to see how this affects it. In the end it looks like you will have to see if you can have socket.io server notify the clients in batches or do some smart kind of multicast trick. – maninak Jun 06 '17 at 16:40
  • Sorry. Its 32 only. Wrote 16 by mistake. Notify in batch can be a good suggestion. Will try to implement. – Swapnil Saurav Jun 06 '17 at 16:44
  • BTW you didn't mention your RAM usage %. If you are getting RAM bound then [`uWebSockets`](https://github.com/uNetworking/uWebSockets) could increase your capacity 2x. In any case this [research on socket.io performance](https://drewww.github.io/socket.io-benchmarking/) could prove useful. – maninak Jun 06 '17 at 16:51
  • Hi, why redis get call response increases from 50ms to 10 second when there are concurrent request? I am using redis on ElasticCache AWS. – Swapnil Saurav Jun 06 '17 at 17:32
  • I think if I solve redis response issue, my application will become faster. – Swapnil Saurav Jun 06 '17 at 17:33
  • 50ms for elasticache is way too much. I'm sorry, this is getting kinda out of my league. Started as PM2 cluster and has gone full on devops. – maninak Jun 06 '17 at 18:11
  • I've also found this [related answer](https://stackoverflow.com/a/34806183/5015955) using Redis as a socket.io Adapter, which looks quite insightful and very close to the description of your architecture. – maninak Jun 06 '17 at 18:32