0

I have a full minidump from an Azure App Service. It comes with the .dmp file, sos.dll and mscordacwks.dll.

I have WinDbg - x86 is the version that can open this dump file. I then use .load c:\path\to\sos.dll. This doesn't give an error, but no other output, either.

The next suggested command, !sos.threads, gives:

Failed to find runtime DLL (clr.dll), 0x80004005 Extension commands need clr.dll in order to have something to do.

I have tried .load directly on mscordacwks.dll, renaming it to clr.dll. I've copied that file into my symbols path, and renaming it to mscordaccore_X86_X86_4.6.24628.01.dll, which came up at one point during my quest here.

I've also tried running the DebugDiag 2 analysis tool, but it says it can't load mscordacwks, despite it being in the same folder, also when it's in the symbol path, also when it's renamed to that specific version above which is listed here too.

I just want to know why my App Service gets stuck at 100% CPU after a random amount of time! What next steps can I try?

Kieren Johnstone
  • 41,277
  • 16
  • 94
  • 144
  • Did you try `!analyze -v` or perhaps `!runaway` to get the thread that's stuck *(assuming it's only one)* and `~s;kbnf` to get the native stack trace of that thread? That said, a dump is just a point in time so you might miss the real cause this way. Better would be to either use procmon *(easier to analyze)* or ETW *(harder to analyze but a shipload of information)* to get an overview of the process over a period of time. – Lieven Keersmaekers Dec 09 '16 at 09:18
  • @LievenKeersmaekers thanks, I will try those. It's an Azure App Service - don't think I can install procmon or set up ETW on the server. There are many threads (~40), some waiting for work to do (App Insights upload threads) and some waiting for access to a query cache (EF Core) – Kieren Johnstone Dec 09 '16 at 12:45
  • better use ETW: http://stackoverflow.com/a/39856838/1466046 instead of analyzing snapshots from dumps. the WPT can be xcopied to the other systems (only make sure the CPU architecture matches) – magicandre1981 Dec 09 '16 at 15:31

1 Answers1

3

It seems you're not very familiar with WinDbg, so I'll be a bit more verbose than necessary.

WinDbg - x86 is the version that can open this dump file

Any version and bitness of WinDbg will be able to open the dump. Even the 32 Bit WinDbg can open a 64 bit .dmp file. This does not mean that you use the correct version to do what you want to achieve.

This doesn't give an error, but no other output, either.

That's ok. It means that the extension was loaded successfully. That's good to know, because it means you're using the correct bitness of WinDbg. If it's really the x86 WinDbg you use, this indicates you have a 32 bit SOS DLL.

If the bitness is incorrect, you get an error message, the same as if you try loading a 32 bit DLL into a 64 bit process or vice versa (aka. BadImageFormatException in .NET)

Extension commands need clr.dll in order to have something to do.

The SOS extension is for .NET, so SOS is looking for a .NET framework loaded into the process. This may be

  • clr.dll for .NET 4 / 4.5 and maybe higher
  • mscorwks.dll for .NET 2 / 3 / 3.5
  • coreclr.dll for Silverlight and .NET Core

From the message, we can derive that you have a SOS.dll for .NET 4, which is why it's looking for clr.dll instead of something else. Sounds reasonable for an Azure webservice, since Azure is newer than .NET 2.

To see whether .NET was actually loaded into the process, use the following commands:

lm m clr
lm m mscorwks
lm m coreclr

If any of these commands produces some output, you'll know which version was loaded. Note that .NET 4 and .NET 2 may occur in parallel (both versions used in the process).

I have tried .load directly on mscordacwks.dll, renaming it to clr.dll.

Here's a huge misunderstanding:

  1. .load loads something into the WinDbg process. Even if you manage to load it there, SOS will still search for it in the dump file instead.
  2. mscordacwks is not the .NET framework. Do not confuse it with mscorwks. The dac part is for data access control. It's a DLL to manage access to the .NET structures in memory, since .NET has its own memory management.

However, renaming it, might be needed. That's a difficult story. It seems you already found Google results for it...

renaming it to mscordaccore_X86_X86_4.6.24628.01.dll

It goes into the right direction, but I don't think that's the correct name. Would you mind linking the original advice, so I can do some research before complaining about something I might have old knowledge about?

IMHO the name should be

mscordacwks_x86_x86_4.6.24628.01.dll

(if the version number is correct).

As mentioned by @Lieven Keersmaekers in the comments already, having a correct symbol path pointing to Microsoft and then doing a

!analyze -v

should download the necessary mscordacwks files from Microsoft. That way it will automatically have the correct name and be located in the correct folder.

I've also tried running the DebugDiag 2 analysis tool

For DebugDiag to work correctly, it also needs mscordacwks. The easiest way would be to also use the Microsoft symbol server, so that it can download the file itself.

I just want to know why my App Service gets stuck at 100% CPU

Analyzing that from a single crash dump file is erroneous. The process might just have been doing something "normal" when you captured the crash dump file.

If you have many crash dumps with the same call stack, that might indicate that this method is in an endless or at long running loop. To get many crash dumps automatically at high CPU, try ProcDump, see how to take a good crash dump for .NET

What else might have gone wrong?

You said you were supplied with those files. From the naming of the files, I assume they were taken from the machine where the crash happened. That's basically a good idea. Be aware that there are many such files on a PC.

If you run my tool mscordacwks Collector, you'll see what I mean. That tool by the way will detect the versions and rename the files accordingly. Perhaps you can try it, it the machine is still available.

Community
  • 1
  • 1
Thomas Weller
  • 55,411
  • 20
  • 125
  • 222
  • Thanks, just the kind of advice I need. I'll dig into and run through everything a little later. FWIW, the minidump comes with sos.dll and mscordacwks.dll when the dump zip is served from Kudu on the server. So it seems I'm *supposed* to use those versions, and yet following any of the small handful of tuts on debugging this kind of dump from Azure, it's errors all the way. Note that it's not a VM I can RDC too, but a remote dump I have. Anyway, will update later - thanks again – Kieren Johnstone Dec 09 '16 at 12:39
  • @KierenJohnstone: thanks for the information. If that ZIP file is generated automatically, let's assume that it was done the correct way. In such a case, I'd indeed say you're supposed to use it. Unfortunate, that they have not made it more convenient. I'm not so familiar with Azure, also Kudu means nothing to me yet. – Thomas Weller Dec 09 '16 at 13:07