Disclaimer: Since I am unsure of what the issue really is, I have put it into Stackoverflow instead of Serverfault.
Description I am an owner of an application that approaches launch, so I made a tool that does regressiontests and stresstesting (loadtesting), when we're testing with 2-3 clients we see no impact, but as soon as we reach 8-10 clients we see huge impact in service delays and handle time from our API.
TPS = Tests Per Second initiated by my tool (clients/threads initiated every second)
Here is the output from my stress test tool:
- TPS @ 5 - Avg. handle time: 1302 ms
- TPS @ 10 - Avg. handle time: 5641 ms
- TPS @ 30 - Avg. handle time: 13549 ms
- TPS @ 50 - Avg. handle time: 6136 ms
- TPS @ 100 - Avg. handle time: 24854 ms
Notes:
- A. What I usually see, is no real pattern between the TPS, except it just takes long time after 10+ TPS. As you can see 50 TPS is much faster than 30 TPS.
- B. What I also noticed, is that it looks like it queues requests, and there is some time between it completes new requests, see this screenshot: https://gyazo.com/1431b5113ac216983a6ca6e1f1bd75ad
- C. The only thing I saw improve, is when removing external calls in the code and running the code again (external API), but when testing the external API isolated E2E, I see no delay (5000+ TPS with 120 ms avg.), but we can't handle more than 10 TPS without delay through our service.
- D. No hardware graph ever gets over 5% (memory or CPU)
Do you have any suggestions, where should I look next? I am open to questions.
Technology:
- C# (.NET Core) IIS WebService
What I have done/tested (with no impact):
- End to end stres-testing the external applications we use, here we can easily handle 5000+ TPS with no real delay.
- Implemented LoadBalancers (NGINX) on all layers (front-end, service, integration)
- Used MongoDB Optimizer to attempt to speed up data flow.
- We have upgraded our DB from 16 GB -> 64 GB and 2 CPU -> 8 CPU
- Looked through IIS setup, to see if I can find something that looks odd.
- Deactivated Antivirus, to see if external HTTPS calls are impacted.