7

We have a long running ASP.NET WebApp in Azure which has no real endpoints exposed – it serves a single functional purpose primarily reading and manipulating database data, effectively a batched, scheduled task, triggered by a timer every 30 seconds. The app runs fine most of the time but we are seeing occasional issues where the CPU load for the app goes close to the maximum for the AppServicePlan, instantaneously rather than gradually, and stops executing any more timer triggers and we cannot find anything explicitly in the executing code to account for it (no signs of deadlocks etc. and all code paths have try/catch so there should be no unhandled exceptions). More often than not we see errors getting a connection to a database but it’s not clear if those are cause or symptoms.

Note, this is the only resource within the AppService Plan. The Azure SQL database is in the same region and whilst utilised by other apps is very lightly used by them and they also exhibit none of the issues seen by the problem app.

It feels like this is infrastructure related but we have been unable to find anything to explain what is happening so if anyone has any suggestions for where we should be looking they would be gratefully received. We have enabled basic Application Insights (not SDK) but other than seeing CPU load spike prior to loss of app response there is little information of interest given our limited knowledge of how to best utilise Insights.

ChrisB_WR
  • 79
  • 3
  • Regarding this problem, do you currently have a better solution or idea? This question is very interesting and I am happy to continue to follow the progress of this issue. – Jason Pan Jun 01 '20 at 08:07
  • @Jason, I don't have anything further on this issue right now - we have considered the possibility of handle exhaustion but as far as we can tell we are not reaching any limits – ChrisB_WR Jun 01 '20 at 08:37
  • I suggest you can raise support ticket in azure portal becase we can't get more info from our apps. – Jason Pan Jun 01 '20 at 09:27
  • 2
    Have you used Azure's profiling tools or the cpu monitoring to dump the process when it's maxing it's CPU? – MikeJ Jun 05 '20 at 01:41
  • @MarkJ many thanks for the suggestion - we have been looking at the Kudu monitoring - is that what you're referring to? – ChrisB_WR Jun 05 '20 at 10:41
  • 1
    @ChrisB_WR Yes, you should be able to profile form there. This might also be helpful depending on your setup... https://azure.github.io/AppService/2019/10/07/Mitigate-your-CPU-problems-before-they-even-happen.html This SO answer might also be helpful - https://stackoverflow.com/questions/49053245/high-cpu-usage-was-detected-for-the-kudu-app-for-azure-app-service – MikeJ Jun 05 '20 at 13:22

1 Answers1

0

According to your description, I thought of two points to troubleshoot your problem. First of all, you can track the running status of your program through the code, and put a log at the beginning and end of your batch scheduled tasks to record the status of each run. If possible, record request and response information and start and end information. This can completely record the time and running status of your task.

Secondly, you can record logs before the program starts database operations, and whether the database connection is successful. The best case is to be able to record, what business will trigger CPU load when operating, and track the specific operating conditions, in order to specifically analyze what causes the database connection failure.

Because you cannot reproduce your problem, you can only guess the cause of the problem. If you still can't find where the problem is through the above two points, then modify your timer appropriately, and let the program trigger once every 5 minutes instead of 30s.

Jason Pan
  • 15,263
  • 1
  • 14
  • 29
  • Many thanks for the suggestions - we have already tried logging the status to try and narrow down the failure but there appears to be no particular correlation of factors when the failures occur which is one reason it feels like an infrastructure thing more than code. – ChrisB_WR Jun 01 '20 at 08:13