3

Today we had a customer run into a problem with a Windows service that is part of one our products after they downloaded and installed a newer version of the product.

They are running the service on a Windows Server 2003 R2 (Service Pack 2) machine, which has .NET 2.0 installed on it (and this is the most recent version of the .NET Framework on that server).

After they installed the product update and restarted the service, it crashed almost immediately with the following error information logged to the Windows Event Log:

Event Type: Error
Event Source:   .NET Runtime 2.0 Error Reporting
Event Category: None
Event ID:   5000
Date:       8/13/2012
Time:       11:46:23 AM
User:       N/A
Computer:   
Description:
EventType clr20r3, P1 our-service-name-redacted.exe, P2 2.6.31.0, P3 4fcd090b, P4 mscorlib, P5 2.0.0.0, P6 4889dc80, P7 e38, P8 1e8, P9 pszqoadhx1u5zahbhohghldgiy4qixhx, P10 NIL.

Now, a few other customers are running the same version of the Windows service without any issues (on different versions of Windows), and I tested the service on a virtual machine running Windows Server 2003 R2 (Service Pack 2) and did not encounter this problem, but it happens consistently for this single customer.

So, this isn't a "what's wrong with my code?" question: I'm more interested in two things about this error information that I find odd:

  • The faulting module (P4) is mscorlib
  • P9, which is normally supposed to name the exception that occurred as far as I understand things, contains what looks like garbage data (or some kind of obfuscated information perhaps?)

Is there a general explanation for this? I tried Googling but not much luck, since it's hard to search for "P9 garbage" and similar and get anything useful. In particular, I'm really curious what the "gibberish" value for P9 could indicate. For example, could this be a hint that they have a corrupted installation of .NET, or does this "gibberish" actually mean something?

Also, I am somewhat surprised the faulting module is mscorlib and not one of the our own assemblies, which makes me wonder again if the customer's .NET installation is corrupted, or a virus or other malware is lurking on their server.

So, as mentioned, are there any commonplace explanations for this rather odd error report and the P9 "gibberish," or any particular troubleshooting steps I should try beyond trying to get a crash dump and debugging in WinDbg?

Mike Spross
  • 7,999
  • 6
  • 49
  • 75
  • 1
    According to [a site I found](http://mrpfister.com/programming/demystifying-clr20r3-error-messages/) as the second link for "decode clr20r3", the exception information in P9 is hashed if it is too long to fit. – Damien_The_Unbeliever Aug 14 '12 at 06:42
  • And doing a search based on this specific value seems to indicate that it may be related to a [COM Exception](http://www.pcreview.co.uk/forums/system-runtime-interopservices-comexception-t2404096.html) – Damien_The_Unbeliever Aug 14 '12 at 06:44
  • Oh, and finally, all that the module information is telling you is *where* an exception message originated. Unless you're doing some quite unusual programming, I'd have thought that your code calls *into* **mscorlib** quite frequently - so why is it surprising that an exception originated within it? – Damien_The_Unbeliever Aug 14 '12 at 06:52
  • @Damien_The_Unbeliever: It's late and my Google fu apparently sucks as a result ;-). Thanks for the quick response. As for **mscorlib**: good point. I guess I was just expecting to see a different assembly name there, because I was assuming the exception was occurring in one of the service's assemblies and just not being caught. – Mike Spross Aug 14 '12 at 06:58
  • Following the advice in this answer to a related question (http://stackoverflow.com/a/4053325/17862), the exception is happening in a method called **InvokeMember** in **mscorlib**, and this service does make COM Interop calls, so the root cause being a **COMException** actually wouldn't surprise me. – Mike Spross Aug 14 '12 at 07:02
  • @Damien_The_Unbeliever: By the way, I would consider your comments a valid answer, and would upvote them in an answer at the very least. I'll let the question sit for awhile to see if anyone else comes along, otherwise I will accept your remarks as an answer if you post it as such. For now though, I need to go to bed (3 AM here), so I'll check back on this question tomorrow. – Mike Spross Aug 14 '12 at 07:04

1 Answers1

7

The exception information in P9 is hashed if it would be otherwise too long to fit within the field (I'm not sure what the length of the fields are, but apparently they're limited).

However, unless you're unlucky, it's quite likely that the hash code will have been encountered by plenty of people in the past - so you can do searches based on the hash, and you'll likely find people who already know what type of exception it actually was. In this case, it appears to be a COM exception.

Finally, all the the module information tells you is where the exception originated. It is not at all uncommon for code that you call into to throw exceptions, and it would be an unusual .NET program that didn't have quite a few calls into mscorlib. It's especially not surprising when we (now) know that it's a COM exception.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • Interestingly, I could not reproduce this issue in a VM running the same version of Windows Server 2003 the customer in question was using, and the issue did not occur when they installed our product on a different server (running Windows 2008 R2). So, it appears to have been caused by a broken .NET installation, but it's not clear how that happened. The customer did say (in a follow-up conversation) they had had "trouble" installing/updating .NET on that machine in general, but they couldn't offer specific details of the issues. – Mike Spross Aug 17 '12 at 02:21
  • Stranger still was the fact the the previous version of our product that had been installed on the Windows Server 2003 machine worked correctly. It was an older version and many refactorings had occurred between that version and the version the customer was updating to, but the older version also made COM calls, since that is the primary purpose of the product. So, all in all, it was just strange. – Mike Spross Aug 17 '12 at 02:24