5

For the .NET Core 2.2 application deployed on a single AWS EC2 host I am comparing IIS hosting vs plain Kestrel hosting.

For IIS configuration I followed MS documentation.

For Kestrel I simply used:

dotnet app.dll --server.urls http://*:5000

I am running "stress" test with JMeter in order to compare throughput. This test is simply calling app's endpoint with 100 threads for 10 seconds duration (5 seconds warmup). Note, that endpoint is basically getting same data from MSSQL Server database on each call, no caching etc.

As a result, Kestrel fails 75% of requests with socket closed/timeout errors:

enter image description here

QUESTION: What kind of configuration error can lead to such Kestrel behavior? I've tried to use a basic nginx reverse proxy in front of Kestrel, but still getting the same results.

Sergey Nikitin
  • 845
  • 2
  • 13
  • 25
  • what's Kestrel's MaxConcurrentConnections configured? do you have max connection in MSSQL ? can you test endpoint without using database? – Ori Marko May 15 '19 at 11:32
  • @user7294900, both MaxConcurrentConnections and MSSQL numbers of concurrent connections are not set. I've just run test on another endpoint (reads some info from cache) - pefromance gap was even bigger. – Sergey Nikitin May 15 '19 at 12:23
  • That's weird indeed, because IIS is basically just a reverse proxy in front of Kestrel, but when you host on IIS, you should have a full kestrel running behind anyhow. See https://learn.microsoft.com/en-us/aspnet/core/fundamentals/servers/kestrel?view=aspnetcore-2.2#when-to-use-kestrel-with-a-reverse-proxy – Daboul May 15 '19 at 12:52
  • Consider https://weblog.west-wind.com/posts/2017/mar/16/more-on-aspnet-core-running-under-iis and https://weblog.west-wind.com/posts/2019/Mar/16/ASPNET-Core-Hosting-on-IIS-with-ASPNET-Core-22 – Lex Li May 15 '19 at 13:40
  • Thanks for the links, I've tried nginx+Kestrel on windows and performance was good. Seems I'm experiencing low performance for Kestrel on linux only (AWS ECS in particular). – Sergey Nikitin May 16 '19 at 11:50
  • 1
    @SergeyNikitinI I also have a situation with worse ECS (Fargate) performance compared to IIS / Windows. Did you find any sort of solutions to improve performance? – mbp May 20 '19 at 09:28
  • 1
    @mbp I'm still trying to determine the root issue, but it seems that it some sort of bug in our application. We have some clues it may be caused by middleware or background processes. I will share on any updates. – Sergey Nikitin May 20 '19 at 10:08
  • @mbp I've proceeded to troubleshoot ECS performance issue - now, since we've moved from sync to async controllers 1 container performance is close to EC2 instance. Next, there is a scalability issue - 2 containers are still better, but 4 is much worse. Root cause here is load balancing algorithms seem to conflict with Thread Injection algorithm of CLR. Looks like setting "warmed up" thread pool with ThreadPool.SetMinThreads(100, 100) before BuildWebHost in Main() is a fast hack, but I'm still investigating. – Sergey Nikitin May 23 '19 at 08:24
  • 1
    Thanks for your follow-up @SergeyNikitin I found so far that .NET Core on ECS/Linux is much more performance sensitive than it's EC2/Windows counterpart. Small delays in requests can quickly let the server to become unresponsive, either by running out of threads, or by using all CPU. Hoping 3.0 makes these things better. – mbp May 24 '19 at 09:08
  • 1
    I want to add, our performance issues were caused by using *Fargate* instead of ECS/EC2. On Fargate, you don't know which underlying instance type is running out. Making our application run on c5 instances in ECS/EC2 instead of Fargate, made the world of difference. – mbp Jun 16 '19 at 20:55
  • @mbp We're also moving to ECS/EC2 (due to needs of shared persistent EFS storage). I'll share new performance tests results as soon as we switch to EC2 completely. – Sergey Nikitin Jun 17 '19 at 07:01
  • 1
    @mbp btw, I've tried 3.0/3.1 with this project and it was actually 15-25% faster. Also, managed resources of ECS/EC2 with more CPU allowed to get more RPS. I've also tried various load testing tools since then and the best option was wrk on a dedicated server inside the same vpc. – Sergey Nikitin Feb 20 '20 at 19:18

1 Answers1

1

It turned out, that described behavior occurs when testing the performance of synchronous endpoint.

By following Thread injection algorithm, CLR will have only minWorkerThreads/minIoThreads to process requests and since "stress" test uses more threads than created at the moment we wait for new threads - which leads to almost linear growth of response time.

Switching to asynchronous eliminates the difference in performance, see: enter image description here

References:

Sergey Nikitin
  • 845
  • 2
  • 13
  • 25