23

I'm not sure how to go about debugging this. I have a C# program consisting entirely of managed code, running in .NET 4.5. After running it for a while, at some seemingly random time, I get a an error "An unhandled exception of type 'System.AccessViolationException' occurred in mscorlib.dll". Since I'm running it from Visual Studio (2012) I click "break" and am presented with the following call stack:

mscorlib.dll!System.Threading._IOCompletionCallback.PerformIOCompletionCallback(uint errorCode, uint numBytes, System.Threading.NativeOverlapped* pOVERLAP) + 0x47 bytes    
[Native to Managed Transition]  
ntdll.dll!_NtRequestWaitReplyPort@12()  + 0xc bytes 
kernel32.dll!_ConsoleClientCallServer@16()  + 0x4f bytes    
kernel32.dll!_GetConsoleLangId@4()  + 0x2b bytes    
kernel32.dll!_SetTEBLangID@0()  + 0xf bytes 
KernelBase.dll!_GetModuleHandleForUnicodeString@4()  + 0x22 bytes   
mdnsNSP.dll!7177aa48()  
[Frames below may be incorrect and/or missing, no symbols loaded for mdnsNSP.dll]   
mdnsNSP.dll!71775b06()  
mdnsNSP.dll!71774ded()  
mdnsNSP.dll!71774e8c()  
bcryptprimitives.dll!746d1159()     
bcryptprimitives.dll!746d1137()     
ntdll.dll!_LdrpCallInitRoutine@16()  + 0x14 bytes   
ntdll.dll!_NtTestAlert@0()  + 0xc bytes 
ntdll.dll!_NtContinue@8()  + 0xc bytes  
ntdll.dll!_LdrInitializeThunk@8()  + 0x1a bytes 

An interesting thing I notice is that nothing in the call stack is my code.

What strategy would you advise I use to find the route of the problem? Or have you seen a problem similar to this and have any tips?

Since the exception doesn't seem to include my code, I don't know what information to include that would be helpful in answering the question, but ask me if there is anything else that I should include.

Since the error may be IO related (since PerformIOCompletionCallback is at the top of the stack), this is a list of typical IO tasks that this application performs:

  • TcpListener.AcceptTcpClientAsync
  • NetworkStream.Write/BeginRead/EndRead
  • SqlCommand.BeginExecuteReader/EndExecuteReader
  • StreamWriter.WriteLine

Other notes:

  • It seems to be roughly repeatable - I get the same error in the same place (PerformIOCompletionCallback), but have to wait a different length of time to get it (in the order of minutes).
  • I don't think I can manufacture a small program that reliably highlights the problem. My program handles many thousands of similar IO operations before it hits this error.

Edit:

Based on the suggestion by @Kevin that Mdnsnsp.dll is from Bonjour, I uninstalled Bonjour and tried again. The exception persists, but the call stack is much cleaner:

mscorlib.dll!System.Threading._IOCompletionCallback.PerformIOCompletionCallback(uint errorCode, uint numBytes, System.Threading.NativeOverlapped* pOVERLAP) + 0x47 bytes    
[Native to Managed Transition]  
kernel32.dll!@BaseThreadInitThunk@12()  + 0x12 bytes    
ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes   
ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes    

I'm assuming the Bonjour installer installed some benign hook DLL for network traffic, but uninstalling it did not fix the problem.

Edit:

I have temporarily re-coded all my unsafe functions using slower "safe" equivalents to eliminate that as a suspect. Now none of the assemblies in the application are compiled using unsafe switch. The problem still persists. To reiterate, I now have no unsafe code, no native code and no P/Invoke calls (in user code) in this application, but I am still experiencing the AccessViolationException as described above.

Mike
  • 954
  • 6
  • 10
  • Do you happen to have the /3GB switch enabled in the boot.ini? I remember getting strange errors like this on a system I worked upon. I came to the conclusion the process was running out of resources handling I/O because of the reduced memory available to it. – Paul Ruane Jan 17 '13 at 19:11
  • 3
    Are you using any `unsafe` code or P/Invoke? | Such an error would fit with the GC moving some buffer the native code still needs. – CodesInChaos Jan 17 '13 at 19:15
  • The exception might be thrown by unmanaged code that your managed code is calling. [Enable unmanaged code debugging](http://msdn.microsoft.com/en-us/library/vstudio/tdw0c6sf%28v=vs.100%29.aspx) in your solution and then trying debugging your program. – keyboardP Jan 17 '13 at 19:16
  • 4
    May or may not be a factor - The Mdnsnsp.dll application is also known as Bonjour Services. This application was designed by Apple to allow devices to recognize and communicate with each other without the need for setup or an IP address. It is installed with iTunes and other Apple products. It is also used to locate media files and stream them over your network. – Kevin Jan 17 '13 at 19:19
  • @CodesInChaos I'm not using any P/Invoke at all in my code. I AM using _some_ `unsafe` code, which I've checked and tested very thoroughly. With a bit of work I could rewrite it as safe for the temporary purpose of eliminating it as a suspect. Would you advice this as the first thing to do? – Mike Jan 17 '13 at 19:27
  • @keyboardP The given call stack shows all the native and unmanaged call frames at the point of exception. If my code was anywhere up the stack I believe it would show. – Mike Jan 17 '13 at 19:31
  • @PaulRuane I'm using Windows 7, there is no boot.ini. Also the exception is thrown when the process is using about 100-200MB of memory. – Mike Jan 17 '13 at 19:38
  • Mike, memory usage is not an issue with /3GB switch, but rather the fact that user memory includes address space above 2GB which may case bad native code to go rogue if pointers handled as signed integers. Same issue could happen in x64 OS running your process as x86 (have full 4GB of address space). I'd try @Kevin's suggestion to try without iTunes... – Alexei Levenkov Jan 17 '13 at 19:52
  • @Kevin I've updated the question based on your comment – Mike Jan 17 '13 at 20:30
  • @AlexeiLevenkov I've checked, the 3GB switch is not enabled, and I am running the application on an x86 machine. – Mike Jan 17 '13 at 20:41
  • 2
    Looks like it could be memory corruption (as sure as you are that it isn't :). Install debugging tools for windows and run gflags.exe. Go to the image tab and enter the executable's name (just the name and extension) in the field and hit tab. Check "page heap". This will make it very likely that any bad pointers will hit inaccessible memory. It's worth a try. Also, make sure it isn't using the hosting process in the visual studio project settings. – doug65536 Jan 18 '13 at 07:21
  • Don't forget to turn it off later. Page heap has a significant performance impact. – doug65536 Jan 18 '13 at 07:24
  • @doug65536 I'm not familiar with that tool. If I try launch the program from Global Flags it says "Unable to create process. Launch command line". Will the flags apply even if I run it from Visual Studio? Is there a way to check which flags are enabled for a running process to confirm that I was successful in enabling them? – Mike Jan 18 '13 at 16:34
  • @Mike don't put it in global flags, you will enable it for all processes. Go back to global flags, turn off page heap and apply it. Then go to the *Image File* tab, put in your executable name *and hit Tab*. Then check pageheap. It uses debugging facilities tightly integrated into the system (Microsoft's own people use gflags). – doug65536 Jan 18 '13 at 19:22
  • And to answer the other question, when pageheap is enabled properly, it will output a debug string saying pageheap is enabled (and the process ID I think) when you launch the executable you are working on. – doug65536 Jan 18 '13 at 19:24
  • If you're wondering what's supposed to happen with pageheap - it will make it very likely for stray pointers to make it break into the debugger right where the problem is. It will also offset memory allocations so any overruns past the end of the block will immediately break into the debugger right at the offending code. Or, if you're not so lucky, it will just run slower and not find the bug. It is effective for finding bad pointers and buffer overflows though. – doug65536 Jan 18 '13 at 19:51
  • @doug65536 In the "Image File" tab I enabled page heap using the exe name (and ".exe") for image and also tried with the full path and command line args (just incase). I ran the program from VS, and it crashed in same place in the same way. I did not observe the diagnostic output, but I'll do that next time (it can take a few hours to crash so it's a slow process). – Mike Jan 18 '13 at 20:15
  • @Mike Does this apply? http://blogs.msdn.com/b/tom/archive/2012/05/07/windows-azure-worker-role-crashing.aspx Are you using IntelliTrace? – doug65536 Jan 18 '13 at 21:50
  • I agree with @CodesInChaos, now that you have removed all `unsafe` code, can you post the P/Invokes your are using. Its most likely that you are not locking down a buffer. – Richard Schneider Jan 19 '13 at 07:08
  • 1
    This discussion is growing too long and hard to follow. However it contains good information which should be integrated into the the question or an answer. Please do that and if needed, continue the discussion in the chat! – markus Jan 19 '13 at 14:23
  • @doug65536 No I am not using IntelliTrace. – Mike Jan 21 '13 at 16:24
  • @RichardSchneider As I stated previously in my response to CodeInChaos and in the question itself, there are no P/Invoke calls in my code – Mike Jan 21 '13 at 16:29
  • @markus-tharkun How do I continue the discussion in the chat? There is sometimes a link near the comment box but there isn't at the moment. – Mike Jan 21 '13 at 16:31
  • You can just go to http://chat.stackoverflow.com/ and create a room for whoever wants to discuss this further. – markus Jan 21 '13 at 16:32
  • For anyone who's interested, I've created [a chat](http://chat.stackoverflow.com/rooms/info/23102/continued-discussion-for-access-violation-in-code-that-is-not-mine) to continue the discussion – Mike Jan 21 '13 at 16:42
  • @Mike It happens only when the debugger is attached -> sounds like malware to me... have you checked your machine is clean, no rootkits or similar? – Lorenzo Dematté Jan 31 '13 at 14:08
  • Did anything ever come of this? I've been dealing with what seems like a pretty similar problem on and off for a few months now. – rationull Mar 08 '13 at 01:24
  • 1
    @rationull Sorry, I never found an answer. Let me know if you figure something out! – Mike Mar 08 '13 at 01:31

3 Answers3

3

Since its PerformIOCompletionCallback that is causing the error I would look at your Asynchronous IO calls.

  • TcpListener.AcceptTcpClientAsync
  • NetworkStream.Write/BeginRead/EndRead
  • SqlCommand.BeginExecuteReader/EndExecuteReader

The error looks to be happening because the handle that was registered is no longer valid. Since it is happening in Managed code the cause is going to be from a Managed object and NOT from a 3rd party native DLL.

shimpossible
  • 356
  • 2
  • 5
  • What do you mean by a "handle"? If you are referring to a Windows object handle, then isn't that something the .net framework should be checking for me? I don't have any "3rd party native DLL", so I never suspected that to be the problem. – Mike Feb 04 '13 at 15:45
  • Examples of what he shimp means...Any opened port read/write stream that was closed/disposed of then used can cause this...or performing too many operations on the port at once. Also, keep in mind that using an async-callback will delay any errors until EndXxxx is called... – andrew Feb 04 '13 at 17:35
2

don't know if it can help you but looks like we faced a similar problem several years ago. As I remember our investigation pointed on dll in other program - we found that memory access violations can be caused by antiviruses (NOD32 in our case), firewalls or network sniffers/traffic controllers.

Try to check applications log (Control Panel -> System and Security -> Administrative Tools -> Event Viewer) for errors caused by above applications. If problem is with other program try to disable/uninstall it and check again if crash still appears in your programm.

UPD Have you tried to reproduce this issue on the clean test environment?

Mikhail Churbanov
  • 4,436
  • 1
  • 28
  • 36
  • I don't have the time to set up a clean test environment. But I have experienced that it only seems to occur when the debugger is connected. – Mike Jan 28 '13 at 16:23
1

You could use debugdiag to see what is causing the AccessViolationException on your machine. Configure a crash rule for your process and examine the dump and log files. I hope you will get more information on the subject that way. Also be sure to have your machine run the latest windows updates bcos i had an similar issue that was solved in a security update for the CLR version.

Ian
  • 67
  • 2
  • 14