1

I have a .NET Framework 4 ASP.NET application (combo of WebForms and MVC) which has been live for a number of years now. Over the last few months, the application would suddenly stop responding and the application pool would start to use 99% CPU and 3gb(!) of RAM. This happens about once every 2 weeks, although today it's happened twice.

Usually we kill the process and restart the app pool which works, however we've now taken dumps using DebugDiag to try to get to the bottom of this. However I'm having trouble working out what the issue is - the main reason being that it appears the Garbage Collector was either running or in an unusual state - '!dumpheap' gives me the below

The garbage collector data structures are not in a valid state for traversal.
It is either in the "plan phase," where objects are being moved around, 
or we are at the initialization or shutdown of the gc heap. Commands related 
to displaying, finding or traversing objects as well as gc heap segments may 
not work properly. !dumpheap and !verifyheap may incorrectly complain of 
heap consistency errors.

Address       MT     Size Object <exec cmd="!ListNearObj /d
01ef1000">01ef1000</exec> has an invalid method table.

I've ran ~*e !clrstack and got a lot of output, with nearly every thread showing 'System.Threading.Monitor.ReliableEnter'...is this normal?

I'm just wondering if anyone knows anything based on the above, or has any tips on analysing this dump file? The dump is a full dump so it's about 3gb

I've pasted my clrstack output below just incase anyone fancies a look!

https://pastebin.com/dBE5VAYJ

Chris
  • 7,415
  • 21
  • 98
  • 190
  • Interesting! 3 close votes and not a single comment. – vendettamit Dec 21 '18 at 15:36
  • When it comes to debugging, language starts to become important. "with nearly every thread" - 94 threads, 32 ReliableEnter, that's not "nearly every", it's exactly 32 out of 94. It's not hard to find that out and it's not hard to write that down. But it tells the readers whether or not you care about the details. – Thomas Weller Dec 21 '18 at 17:08
  • Also: it uses 99% CPU? For how long? Using 99% CPU is good, isn't it? It means that CPU does something for the money you spent for it. Next: it uses 3 GB of RAM? Is that really RAM (working set) or is it virtual memory? Why do you put an exclamation mark after 3 GB? I have applications that use 18 GB and it's totally fine. If you think 3 GB is much, please tell us what you expect or what is "normal". – Thomas Weller Dec 21 '18 at 17:11
  • use [ETW to trace CPU usage](https://stackoverflow.com/a/42349119/1466046) and [other tools to trace memory usage](https://stackoverflow.com/q/12474321/1466046) – magicandre1981 Dec 23 '18 at 16:04

1 Answers1

1

I would say: in the state you captured the dump, the process is in the middle of garbage collection. The objects structures are broken and you'll not be able to analyze the objects, so it will be hard to see what uses the memory.

Use Procdump and configure it to take a crash dump based on virtual memory size, e.g. 2 GB if your application normally uses 1 GB only. That should be the -m command line switch.

Hopefully the long-lasting garbage collection process has not begun yet and you'll be able to look at the objects in memory with !dumpheap -stat or import the crashdump into Jetbrains dotMemory for an analysis.

Another observation is in your call stacks. There are 26 GetBuildResultFromCacheInternal(). I think that this is an indication that the application is in the middle of being recycled.

Thomas Weller
  • 55,411
  • 20
  • 125
  • 222