46

My client has an ASP.NET application installed on two production servers (balanced with NLB, but that's irrelevant). Both servers crash every 3-4 hours with the following event viewer logged error:

Faulting application name: w3wp.exe, version: 7.5.7601.17514, time stamp: 0x4ce7afa2
Faulting module name: clr.dll, version: 4.0.30319.18034, time stamp: 0x50b5a783
Exception code: 0xc00000fd Fault offset: 0x000000000001a840
Faulting process id: 0xd50
Faulting application start time: 0x01ce97fe076d27b4
Faulting application path: c:\windows\system32\inetsrv\w3wp.exe
Faulting module path: C:\Windows\Microsoft.NET\Framework64\v4.0.30319\clr.dll Report Id: e0c90a5f-0455-11e3-8f0e-005056891553

I have no idea how to debug or where to start. When the crash is about to happen the server processor usage jumps to 100% and stays there. The process at fault is w3wp.exe. I'm not even sure if my code is generating the error or not. It's IIS 7.5. Any pointers would be greatly appreciated.

Kirk Woll
  • 76,112
  • 22
  • 180
  • 195
cristi71000
  • 1,094
  • 2
  • 10
  • 16

4 Answers4

83

It looks like you have a StackOverflow Exception, which is caused by unbounded recursion (a function repeatedly calling itself, etc). This can't be caught by regular try/catch block. You can track the problem down using DebugDiag and WinDbg.

DebugDiag can be configured to generate a crash dump when the StackOverflowException occurs. Download at https://www.microsoft.com/en-us/download/details.aspx?id=58210.

  1. Open DebugDiag and click Add Rule.
  2. "Crash" should already be selected. Click Next.
  3. Choose "A specific IIS web application pool" and click Next.
  4. Select the application pool and click Next.
  5. You should be on the Advanced Configuration Window. Click Exceptions under Advanced Settings.
  6. Click Add Exception and choose Stack Overflow, with an Action Type of Full Userdump
  7. Click OK and save and close out.

Next time a StackOverflowException occurs, you'll have a crash dump. Now to need to interpret the dump file.

Debugging tools for Windows is part of the Windows SDK and can be downloaded at http://msdn.microsoft.com/en-US/windows/hardware/gg463009/.

  1. To use WinDbg, you'll need to get the symbols files. Download the symbol files and put them in a local folder.
  2. Open up WinDbg. On the File menu, click Symbol File Path.
  3. In the Symbol path box, the documentation says to type the following command: SRV*your local folder for symbols*http://msdl.microsoft.com/download/symbols, however I just put in the local folder for the symbols and it worked fine.
  4. Exit out and open WinDbg again, and Open Crash Dump and locate the dump file that was created by DebugDiag.
  5. In the command line, type .loadby sos clr
  6. Now type !CLRStack

In the results, it should be clear what the problem is (you'll likely see a BUNCH of lines showing the function(s) that was repeatedly being called).

Dobin
  • 85
  • 1
  • 3
  • 13
MikeSmithDev
  • 15,731
  • 4
  • 58
  • 89
  • Related question: http://stackoverflow.com/questions/6019674/how-to-debug-crashed-dump-file – MikeSmithDev Aug 20 '13 at 13:08
  • I tried this, caught a dump, got some debugging symbols and now I have the following error when loading the dump: `*** ERROR: Symbol file could not be found. Defaulted to export symbols for ntdll.dll - This dump file has an exception of interest stored in it. The stored exception information can be accessed via .ecxr. ...*** ERROR: Symbol file could not be found. Defaulted to export symbols for clr.dll - *** WARNING: Unable to verify checksum for mscorlib.ni.dll *** ERROR: Module load completed but symbols could not be loaded for mscorlib.ni.dll` – cristi71000 Aug 21 '13 at 12:09
  • OOps, my bad, now I loaded dump with no errors but when I do '!CLRStack' it says 'Failed to load data access DLL' – cristi71000 Aug 21 '13 at 12:25
  • 1
    Finally managed to get it to work. Unfortunately the call stack is unhelpful: `OS Thread Id: 0xdd0 (42) Child SP IP Call Site 0000000011b1ef48 000007fef7a5c91b [GCFrame: 0000000011b1ef48] 0000000011b1ef88 000007fef7a5c91b [ContextTransitionFrame: 0000000011b1ef88] 0000000011b1efc8 000007fef7a5c91b [GCFrame: 0000000011b1efc8] 0000000011b1f1b0 000007fef7a5c91b [ComMethodFrame: 0000000011b1f1b0] ` – cristi71000 Aug 21 '13 at 12:47
  • Hmmm... How big was your dump file? Mine was huge (100+MB) and the commands `.loadby sos clr` and then `!CLRStack` took a while to load and output a lot of data.... scrolling through it showed my hundreds of calls to the same function which showed me where to look to find the recursion. You may also be able to open the dump file in visual studio. – MikeSmithDev Aug 21 '13 at 13:05
  • The dump was 1.5GB :). I first analyzed it with the debugdiag analyzer and then I used windbg. DebugDiag showed me a huge call stack, but it only showed memory addresses. Indeed one of them was called hundreds of times. In any case, all stack trace is in the clr.dll. I'm more convinced now that there is a clr.dll error which is triggered by some behavior in our code. Maybe a garbage collection fault or something along these lines. Especially since the crash happens 3-4 hours apart even at weird hours like 4:00am when nobody uses the app. – cristi71000 Aug 21 '13 at 13:36
  • 3
    This is a terrific writeup. Helped me get to the bottom of an annoying IIS SO. – Kirk Woll Jan 31 '14 at 00:45
  • @KirkWoll Great! It took days to figure all that out... though I guess it looks pretty easy boiled down to a few bullet points. I think out of all my answers on SO, this issue caused me the most grief, so its nice to know it provides some value. – MikeSmithDev Jan 31 '14 at 02:09
  • A more recent DebugDiag is located there http://www.microsoft.com/en-us/download/confirmation.aspx?id=40336 (the old version linked here does not installs) – JB. Feb 12 '14 at 09:14
  • @JB. Thanks for the notification. Hopefully the instructions on how to use haven't changed in the new version. – MikeSmithDev Feb 12 '14 at 13:38
  • 1
    For me, trying to add the new Rule in DebugDiag resulted in a "Failed to start DbgSVC. GetLastError returns 0x00000422". Fix is here: http://stackoverflow.com/q/26127366/12484 – Jon Schneider Sep 30 '14 at 18:14
  • 1
    @JonSchneider Thx for follow up details. – MikeSmithDev Sep 30 '14 at 18:37
  • @MikeSmithDev Thank YOU for the fantastic answer above! Saved me hours in putting out a fire on my company's production IIS server this morning! – Jon Schneider Sep 30 '14 at 19:43
  • 1
    This worked like a charm and saved me a ton of time. Thank you. Note: the links are outdated but a quick google will find them. – Todd Smith Jul 26 '16 at 20:33
  • This answer is awesome and I think it will help me and I know it's old however right now at some point w3wp.exe crashes and I want to catch that dump but right now I'm capturing everything so how do I filter out for example Response.Redirects that cause thread aborts. I don't want dumps for those I want a dump for when the w3wp.exe actually crashes – Chris Ward Jul 11 '18 at 17:11
  • @ChrisWard If I understand your problem, you can add a "false" parameter to Response.Redirect that will stop the thread abort. https://stackoverflow.com/questions/13727422/can-endresponse-increase-performance-of-asp-net-page/13727769#13727769 – MikeSmithDev Jul 11 '18 at 18:44
  • @MikeSmithDev yes I'm aware of that but because that allows the code to continue to run that can cause other issues. I was able to find the actually hex error code being thrown during the crash of w3wp.exe it ended up not being out code at all but something the customer had installed to monitor for crashes that was causing the w3wp.exe to crash. How ironic is that. – Chris Ward Jul 23 '18 at 14:42
2

Some addition to above answer. Develop Explorer extension which got error at user login. So for user it looks "flashing screen" (while explorer tries to start and crash, then restart etc). Logged in under another user account installed DebugDiag and WinDbg. I'm using Windows 8.1 with .Net 4.0 with all latest updates on today (Jan 13, 2014) Tried download few symbols locally, but WinDbg can'not load clr.pdb because of incorrect signaure.

Solved it using symbols online - use "SRV*http://msdl.microsoft.com/download/symbols" as symbols path.

0

Another cause might "infinite recursively function". When occures infitine loop Windows try to avoidence deadlock and disable releated application pool.

I met same issue today. I have a recursive function which list parentproject-sub project. One project is setted itself parent project and when recusive function try list all parent-sub project, infinite loop occures.

Dreamcatcher
  • 798
  • 13
  • 31
0

I was able to check Event Viewer -> Windows Logs -> System and find

Application pool 'DankAppPool' is being automatically disabled due to a series of failures in the process(es) serving that application pool.

Below that:

A process serving application pool 'DankAppPool' suffered a fatal communication error with the Windows Process Activation Service. The process id was '5704'. The data field contains the error number.

And:

The QueueMonitor service terminated unexpectedly. It has done this 32 time(s). The following corrective action will be taken in 60000 milliseconds: Restart the service.

At least the QueueMonitor service is a place to start.

sirdank
  • 3,351
  • 3
  • 25
  • 58