6

I've encountered a strange problem with an application I've developed. The application is a windows service hosting AspNetCore 2.0 running on Kestrel. This application receives requests through an IIS site acting as a proxy.

In this application, I also use signal 2.2.2 integrated using Microsoft.AspNetCore.Owin. All worked well until I detected that the application was not responding to requests.

Other applications on the same machine and using the same IIS server as proxy were working fine. Restarting the application pool serving the site solved the problem temporarily.

The problem resurfaced again and digging through monitoring information the application seems to hang when there are 400 signalr SSE connections on the same machine. This seems plausible as I've found that by default OWIN limits the number of concurrent requests at 100 * number of cpus. (Note that a site on the same machine is serving 5000 requests per minute without a sweat but these are not a long-lived request like the SignalR ones)

The problem is that I seem unable to find the same option when hosting Owin inside AspNetCore. Does someone know if this can be the solution and what is the correct setting?

EDIT: I'm fairly certain that the issue is caused by the number of SignalR connections opened concurrently because by disabling it in Javascript the problem vanished.

2nd EDIT: signalr does not seem to be the cuplrit as load testing the site with crank both in test and in production worked until 5000 concurrent connections which is the default IIS limit and is fine by me

Beorn
  • 401
  • 9
  • 21
  • SignalR 2 isn't supported on Asp.Net Core. See the new preview for Asp.Net Core 2.1. – Tratcher Apr 08 '18 at 17:32
  • That 100x CPUs limit does not apply to kestrel. – Tratcher Apr 08 '18 at 17:34
  • I know it's not supported and I'm waiting eagerly for asp.net core 2.1 to be released. I also supposed that the limit was not applicabile to Kestrel but something is making all requests wait and that seemed the most logical culprit as it happens only when the number of signalR open connections reaches 400 on a 4 cpu sever – Beorn Apr 08 '18 at 20:11
  • i hope you are not opening a db connection for each socket connection – Parv Sharma Apr 09 '18 at 16:00
  • No this application doesn't even have a database – Beorn Apr 10 '18 at 10:22

2 Answers2

4

After some trial and error I've been able to identify and correct the problem but it was no easy task so I'm leaving this answer behind if someone else stumbles upon the same problem.

Disabling SignalR did not solve the problem but it made it appear less often.

Thanks to the monitoring in place on the server and IIS I observed that the problem appeared when the number of connections to the site started growing rapidly. This system primarily makes request to other services so it does not have a database nor expensive computations.

Examining the code I've found that there were three problems:

  • a new HttpClient was created for every request which can exhaust the sockets which are not reused between requests blog blog2 blog3
  • by default there's a maximum number of concurrent connections on the httpClient to a single domain and this limit is set by default to 2 (!!!) blog4
  • the code was waiting synchronously on every web request to another system (this program was ported from an mvc4 site which never displayed this problem). This worked fine in MVC but asp.net core is very sensitive to this as it will rapidly exhaust all available threads and because the thread pool starts with the number of cores they will be exhausted quickly making all the requests wait. This value can be increased as temporary stop gap solution with ThreadPool.SetMaxThreads(Int32, Int32) but the only solution is to transform all calls in async calls.

Once all calls were mde async the problem never returned. Basically the problem was due to threadpool starvation and aspnet core sensibility to it vs MVC. Here you can find a nice explanation and a detection method using PerfView.

Beorn
  • 401
  • 9
  • 21
2

This could be the issue, but it's unlikely. When hosting in dotnet core you're probably using Kestrel as a webserver implementation, to switch these limits such as concurrent connections you can use KestrelServerLimits class as described in this Microsoft article.

KestrelServerLimits should not be causing you any problems since the default value for ConcurrentConnections is unlimited.

Leonardo Menezes
  • 496
  • 5
  • 11