How to resolve Heartbeat took longer than "00:00:01" failure?

Question

I have a .NetCore C# project which performs an HTTP POST. The project is set up in Kubernetes and I've noticed the logs below:

Heartbeat took longer than "00:00:01" at "02/22/2020 15:43:45 +00:00".
warn: Microsoft.AspNetCore.Server.Kestrel[22]
      Heartbeat took longer than "00:00:01" at "02/22/2020 15:43:46 +00:00".
warn: Microsoft.AspNetCore.Server.Kestrel[22]
      Heartbeat took longer than "00:00:01" at "02/22/2020 15:43:47 +00:00".
warn: Microsoft.AspNetCore.Server.Kestrel[22]
      Heartbeat took longer than "00:00:01" at "02/22/2020 15:43:48 +00:00".
warn: Microsoft.AspNetCore.Server.Kestrel[22]
      Heartbeat took longer than "00:00:01" at "02/22/2020 15:43:49 +00:00".
warn: Microsoft.AspNetCore.Server.Kestrel[22]
      Heartbeat took longer than "00:00:01" at "02/22/2020 15:43:50 +00:00".
warn: Microsoft.AspNetCore.Server.Kestrel[22]
      Heartbeat took longer than "00:00:01" at "02/22/2020 15:43:51 +00:00".
warn: Microsoft.AspNetCore.Server.Kestrel[22]
      Heartbeat took longer than "00:00:01" at "02/22/2020 15:43:52 +00:00".
warn: Microsoft.AspNetCore.Server.Kestrel[22]
      Heartbeat took longer than "00:00:01" at "02/22/2020 15:43:53 +00:00".
warn: Microsoft.AspNetCore.Server.Kestrel[22]
      Heartbeat took longer than "00:00:01" at "02/22/2020 15:43:54 +00:00".
warn: Microsoft.AspNetCore.Server.Kestrel[22]
      Heartbeat took longer than "00:00:01" at "02/22/2020 15:43:55 +00:00".
warn: Microsoft.AspNetCore.Server.Kestrel[22]
      Heartbeat took longer than "00:00:01" at "02/22/2020 15:43:56 +00:00".
warn: Microsoft.AspNetCore.Server.Kestrel[22]
      Heartbeat took longer than "00:00:01" at "02/22/2020 15:44:33 +00:00".
warn: Microsoft.AspNetCore.Server.Kestrel[22]
      Heartbeat took longer than "00:00:01" at "02/22/2020 15:44:34 +00:00".
warn: Microsoft.AspNetCore.Server.Kestrel[22]
      Heartbeat took longer than "00:00:01" at "02/22/2020 15:44:35 +00:00".

After some initial research, it seems this is a common result of threadpool starvation. Accordingly, in November last year, I made the post asynchronous and also logged the Max threads and Available threads as follows for monitoring purposes:

ThreadPool.GetMaxThreads(out int workerThreads, out int completionPortThreads);
ThreadPool.GetAvailableThreads(out int workerThreadAvailable, out int completionPortThreadsAvailable);
_log.Info(new { message = $"Max threads = {workerThreads} and Available threads = {workerThreadAvailable}" });

Consistently over the past few months, the logging shows: Max threads = 32767 and Available threads = 32766. That seems fine, however, I'm noticing the same Heartbeat error so am wondering if this really is a threadpool starvation issue. Might someone know what else is going on and if this error is actually a result of something else? Any investigation/resolution tips for this would be much appreciated!

it means the CPU on your server is overworked. You need to scale either horizontally or vertically. — Andy, Mar 15 '21 at 23:23
@Andy, what do you mean by scaling horizontally or vertically? Are there any specific code changes that may be done to do so or would that more so be config related? — ENV, Mar 15 '21 at 23:52
You need to increase the the CPU capacity of your server (vertically) or increase the number of instances of your app (horizontally). You typically increase instances. I have 5 instances of an app running. Azure will balance all the requests across all 5 so each instance shares the responsibility of a single job. When i didn't have 5 instances, the CPU usage was at 80%-90%, and we'd frequently see heartbeat issues. We scaled to 5 instances, now each instance runs at 10-20% CPU. It's more expensive, but it runs smooth. So try running 2 instances and see what happens. — Andy, Mar 16 '21 at 00:57
@ENV Does the above comment resolve your issue? If not, kindly post the answer that worked for you. TIA — Alex G, Apr 04 '22 at 04:08

score 4 · Accepted Answer · answered Apr 05 '22 at 07:20

4

This is a resource issue, as @andy pointed out in his response.

According to OP, the solution to this problem is to either increase the server's CPU capacity (vertically) or the number of instances of your app (horizontally).

answered Apr 05 '22 at 07:20

Alex G

1,179
3
15

3

I'm seeing the same issue in a container running in cloud run. The cpu hasn't gone above 50% of the available and the memory hasn't gone above 40% of what is available in the logs. What are other items to look into? cc @Andy – Steve Apr 06 '22 at 20:28
1

@Steve Did you ever find out what else to look into? Im in the same situation as you – Matt May 16 '22 at 11:15
2

@Matt nope. I've ignored it. Please post if you find something! – Steve May 16 '22 at 17:17
Same situation, cpu utilisation is always before 30% – Lee Aug 10 '22 at 14:45

How to resolve Heartbeat took longer than "00:00:01" failure?

1 Answers1

Linked