8

Our application (written in C++, VS 2010 project) has been running fine on all operating systems prior to Windows 8 (and still does). On Windows 8, however, when orderly exiting the application, an access violation occurs:

mfc100.dll!_DllMain@12()    <<< Crash here
mfc100.dll!__CRT_INIT@12()  
mfc100.dll!__DllMainCRTStartup@12() 
ntdll.dll!_LdrxCallInitRoutine@16() 
ntdll.dll!LdrpCallInitRoutine() 
ntdll.dll!LdrShutdownProcess()  
ntdll.dll!RtlExitUserProcess()  
kernel32.dll!_ExitProcessImplementation@4() 
mscoreei.dll!RuntimeDesc::ShutdownAllActiveRuntimes(unsigned int,class RuntimeDesc *,enum RuntimeDesc::ShutdownCompatMode)  
mscoreei.dll!_CorExitProcess@4()    
mscoree.dll!_ShellShim_CorExitProcess@4()   
msvcr100d.dll!__crtCorExitProcess(int status) line693   C
msvcr100d.dll!__crtExitProcess(int status) line 699 C
msvcr100d.dll!doexit(int code, int quick, int retcaller) line 621   C
msvcr100d.dll!exit(int code) Zeile 393  C
my.exe!__tmainCRTStartup() Zeile 568    C
my.exe!WinMainCRTStartup() Zeile 371    C
kernel32.dll!@BaseThreadInitThunk@12()  
ntdll.dll!__RtlUserThreadStart()    
ntdll.dll!__RtlUserThreadStart@8()  

In an MSDN forum topic it has been suggested to run GC.Collect() before exit, but I couldn't make any difference with such a call shortly before exit.

I am a bit at a loss about how I should debug the problem. As far as I understand, CorExitProcess takes care of cleaning up the managed resources of the application. So could this be a fault in a managed component?
Or is it more likely that some function pointer in _DllMain has been overwritten/corrupted? If so, how would I set a data breakpoint at the address in question? There is a post explaning how to debug a similar issue, but he's having the issue in his own DLL so he can actually peak at the exact source of the problem which I can't.

Any suggestions?

Edit: Additional information, windbg !analyze -v:

FAULTING_IP: 
mfc100+258e6c
64298e6c 8b4654          mov     eax,dword ptr [esi+54h]

EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 64298e6c (mfc100+0x00258e6c)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000000
   Parameter[1]: 53f21f0c
Attempt to read from address 53f21f0c

CONTEXT:  00000000 -- (.cxr 0x0;r)
eax=53f21eb8 ebx=00000000 ecx=64187d2d edx=7fcde000 esi=53f21eb8 edi=00000001
eip=64298e6c esp=00c3f1b8 ebp=00c3f2ec iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00210206
mfc100+0x258e6c:
64298e6c 8b4654          mov     eax,dword ptr [esi+54h] ds:0023:53f21f0c=????????

FAULTING_THREAD:  00000520

DEFAULT_BUCKET_ID:  WRONG_SYMBOLS

PROCESS_NAME:  ww.exe

ADDITIONAL_DEBUG_TEXT:  
You can run '.symfix; .reload' to try to fix the symbol path and load symbols.

MODULE_NAME: mfc100

FAULTING_MODULE: 77bc0000 ntdll

DEBUG_FLR_IMAGE_TIMESTAMP:  4d5f29b8

ERROR_CODE: (NTSTATUS) 0xc0000005 - Die Anweisung in 0x%08lx verweist auf Speicher 0x%08lx. Der Vorgang %s konnte nicht im Speicher durchgef hrt werden.

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - Die Anweisung in 0x%08lx verweist auf Speicher 0x%08lx. Der Vorgang %s konnte nicht im Speicher durchgef hrt werden.

EXCEPTION_PARAMETER1:  00000000

EXCEPTION_PARAMETER2:  53f21f0c

READ_ADDRESS:  53f21f0c 

FOLLOWUP_IP: 
mfc100+258e6c
64298e6c 8b4654          mov     eax,dword ptr [esi+54h]

APP:  ww.exe

ANALYSIS_VERSION: 6.3.9600.17029 (debuggers(dbg).140219-1702) x86fre

MANAGED_STACK: !dumpstack -EE
OS Thread Id: 0x520 (0)
Current frame: 
ChildEBP RetAddr  Caller, Callee

PRIMARY_PROBLEM_CLASS:  WRONG_SYMBOLS

BUGCHECK_STR:  APPLICATION_FAULT_WRONG_SYMBOLS

LAST_CONTROL_TRANSFER:  from 6429da08 to 64298e6c

STACK_TEXT:  
WARNING: Stack unwind information not available. Following frames may be wrong.
00c3f2ec 6429da08 64040000 00000000 00000001 mfc100+0x258e6c
00c3f330 6429dac7 64040000 00c3f35c 77be077a mfc100+0x25da08
00c3f33c 77be077a 64040000 00000000 00000001 mfc100+0x25dac7
00c3f35c 77be07f0 6429daa9 64040000 00000000 ntdll!RtlAddMandatoryAce+0x14e
00c3f3a4 77bfa529 6429daa9 64040000 00000000 ntdll!RtlAddMandatoryAce+0x1c4
00c3f49c 77bfa40e 00000000 00000000 6f2d4890 ntdll!RtlExitUserProcess+0x1e7
00c3f4b0 76ff4231 00000000 77e8f3b0 ffffffff ntdll!RtlExitUserProcess+0xcc
00c3f4c4 6f8b3712 00000000 bd3cbe8b 01f1c054 KERNEL32!ExitProcess+0x15
00c3f74c 6f8c19a2 00000001 00c3f76c 6f1686ad mscoreei!GetFileVersion+0x1835
00c3f758 6f1686ad 00000000 77bdab85 6f8a0000 mscoreei!CorExitProcess+0x27
00c3f76c 70737954 00000000 00c3f784 7073798d mscoree!CorExitProcess+0x94
00c3f778 7073798d 00000000 00c3f7c8 70737ab0 MSVCR100!_query_new_mode+0x159
00c3f784 70737ab0 00000000 a2b843a9 00375f5c MSVCR100!_query_new_mode+0x192
00c3f7c8 70737b1d 00000000 00000000 00000000 MSVCR100!_query_new_mode+0x2b5
00c3f7dc 003274ab 00000000 d1ef1931 00000000 MSVCR100!exit+0x11
00c3f864 76ff173e 7fcdf000 00c3f8b4 77c16911 ww!_enc$textbss$begin+0x64ab
00c3f870 77c16911 7fcdf000 a613e810 00000000 KERNEL32!BaseThreadInitThunk+0x12
00c3f8b4 77c168bd ffffffff 77c8560a 00000000 ntdll!LdrInitializeThunk+0x1f0
00c3f8c4 00000000 003275da 7fcdf000 00000000 ntdll!LdrInitializeThunk+0x19c


STACK_COMMAND:  .cxr 0x0 ; kb

SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  mfc100+258e6c

FOLLOWUP_NAME:  MachineOwner

IMAGE_NAME:  mfc100.dll

BUCKET_ID:  WRONG_SYMBOLS

FAILURE_BUCKET_ID:  WRONG_SYMBOLS_c0000005_mfc100.dll!Unknown

ANALYSIS_SOURCE:  UM

FAILURE_ID_HASH_STRING:  um:wrong_symbols_c0000005_mfc100.dll!unknown

FAILURE_ID_HASH:  {9e516b68-081f-78d6-cf23-b42f2b3cb573}

Followup: MachineOwner
---------

Screenshot of there the crash occurs: Source code

floele
  • 3,668
  • 4
  • 35
  • 51
  • 1
    I've spent hours with this type of problem too - for me it was due to a managed COM component not releasing event interfaces in the unmanaged client. Is COM involved in your situation? – Roger Rowland Apr 10 '14 at 07:32
  • Yes, COM is involved. What do you mean with "event interfaces"? – floele Apr 10 '14 at 07:38
  • I mean that the managed COM component raises events in the unmanaged COM client, so it has an [RCW](http://msdn.microsoft.com/en-us/library/8bwh56xe(v=vs.110).aspx) that doesn't release the native interface pointers until it gets finalized. As that happens after the native runtime has gone, you get an access violation. – Roger Rowland Apr 10 '14 at 07:41
  • Ths situation seems to be reverse though. In our app it's not that .NET calls COM, but COM is used to call .NET methods. – floele Apr 10 '14 at 07:55
  • 3
    One thing that looks a little suspect is that you're using the debug runtime library but the release version of the MFC DLL. Generally it's bad to mix and match debug/release. – Retired Ninja Apr 10 '14 at 08:05
  • 2
    Very true. MFC cleanup occurs in DllMain(), always a good spot to get prior heap corruption to bomb your program. You have the source, set a breakpoint on DllMain in vc/atlmfc/src/mfc/dllmodul.cpp. But having more than one version of the CRT in your program is definitely the first thing to fix. – Hans Passant Apr 10 '14 at 08:44
  • @floele - Could you explain what is that mix of debug/runtime libraries (that Retired Ninja spotted) doing "on all operating systems"? – SChepurin Apr 10 '14 at 08:46
  • @SChepurin This crash stack is from a debugging session with VS2013 on Win8 (debug build). How can I prevent the release version of MFC from sneaking in? – floele Apr 10 '14 at 08:55
  • @floele - have you ever tried analyzing crash dump using WinDbg (almost forgotten practice, i guess)? See http://stackoverflow.com/questions/1649117/analysing-crash-dump-in-windbg – SChepurin Apr 10 '14 at 09:03
  • @SChepurin What additional information should I expect from this when I do? Does this give me anything the debugger cannot? – floele Apr 10 '14 at 09:09
  • 1
    @floele - Good answer why - "Any time you need to debug a truly ugly problem windbg has better technology to do it with than Visual Studio. Windbg has a more powerful scripting language and allows you to write DLLs to automate difficult problems. It will install gflags.exe, which gives you better control over the heap for debugging memory overwrites." (http://stackoverflow.com/questions/105130/why-use-windbg-vs-the-visual-studio-vs-debugger) – SChepurin Apr 10 '14 at 09:19
  • @SChepurin I added Windbg output now...doesn't help me a lot though. – floele Apr 10 '14 at 10:48
  • @floele - Isn't that what Hans Passant said about "...having more than one version of the CRT in your program is definitely the first thing to fix" - "APPLICATION_FAULT_WRONG_SYMBOLS"... "WRONG_SYMBOLS_c0000005_mfc100.dll!DllMain" – SChepurin Apr 10 '14 at 10:53
  • @SChepurin: Any idea how I can prevent the msvcr100d.dll from being injected? Even if I build the app in Release mode, there still is msvcrd100.dll in the call stack. – floele Apr 10 '14 at 11:39
  • @floele - May be Dependency Walker (http://www.dependencywalker.com/) can help you to see when and why it is loaded (even in Release mode). For me it is , possibly, misconfigured system that causes the issue. – SChepurin Apr 10 '14 at 11:58
  • @SChepurin I'll try. However, the issue occurs on *every* Win8 system, so the crash issue is not an issue specific to my system or even this debug DLL (because customer PCs won't have it anyway). – floele Apr 10 '14 at 12:26
  • @floele - I mean, for me it seems that application (possibly) *was built* on misconfigured system with some wrong dependencies. This can lead to undefined behavior. – SChepurin Apr 10 '14 at 12:32
  • @SChepurin Not sure what was the problem. Maybe it didn't build properly the first time, but now I have it running without debug DLLs mixed in. Problem still exists though, updated WinDdg info. – floele Apr 11 '14 at 07:47
  • @floele - There is a small "how to" on WinDbg analysis (http://blogs.msdn.com/b/anandbms/archive/2005/04/20/410225.aspx). So, now you'd have to try to find the function where the crash occurs, using "kb" command. – SChepurin Apr 11 '14 at 08:02
  • @HansPassant Thanks for the hint in regard to the source, I didn't expect to have it. Turns out I do, but I can't actually step into it. Seems be somewhat complicated to get it working, so I [posted a new question for this part of the problem](http://stackoverflow.com/questions/23008222/debugging-mfc-mfc100-dll-cannot-find-or-open-pdb). – floele Apr 11 '14 at 09:15
  • @SChepurin I added a screenshot now with the source code location the crash happens. I'm still trying to figure out what and when the AFX_MODULE_STATE gets corrupted. – floele Apr 11 '14 at 13:25

2 Answers2

6

As discussed in comments, our similar problem was where we had a native C++ application that communicated with a managed C# application running as a COM server. To allow the managed component to communicate events to the C++ app, an event sink was exposed as a simple ATL COM interface from the native side, which on the .NET side was automatically encapsulated in a Runtime Callable Wrapper.

The access violation on application close - which wasn't always visible except in the event logs - was due to the fact that the RCW didn't call Release() on our ATL COM interfaces until it was garbage collected. As this happened when the .NET runtime closed, which was after the native runtime had shut down, it tried to callback into dead code.

The solution for us was to expose a "shutdown" method on the .NET side that disposed of all the communicating objects, then called:

GC.Collect();
GC.WaitForPendingFinalizers();

Ok, I understand that this might not exactly mirror your problem, but the route in to finding out what was causing it was to use the Managed Debugging Assistants, particularly reportAvOnCOMRelease.

We activated the MDA by registry keys and ran the native app via a debugger to see the additional output that identified the COM interfaces that were being held too long. Probably as a first step, it would be wise to activate all of the MDA options to glean as much info as possible from the crash.

Roger Rowland
  • 25,885
  • 11
  • 72
  • 113
  • I activated all MDAs in Debugging -> Exceptions in VS, but I didn't get any additional warning unfortunately. Does it make a difference doing this by registry? – floele Apr 10 '14 at 08:19
  • @floele It shouldn't matter how you activate it, but we had to do it via registry because we were debugging a native project in VS2012, so the .NET debugger didn't kick in. We also used DebugDiag, if that helps. – Roger Rowland Apr 10 '14 at 08:21
  • What did you do with DebugDiag? Did you just analayse the process dump? – floele Apr 10 '14 at 08:58
  • @floele We used the log that it captures rather than the dump - you can see the MDA debug output there. – Roger Rowland Apr 10 '14 at 08:59
4

I tried debugging this using data breakpoints, but that didn't help a lot. I could see that at some point the data being accessed was overwritten, but that didn't happen in a call stack containing any of my own code.

So I resorted in a simpler method and started removing parts of the program until the error disappeared. In a large application it may be hard to remove some parts without breaking others, but I was able to narrow down the source of the issue.

Basically, the problem stopped occurring after removing a certain call to FreeLibrary. After further investigation it turned out that this call happens during DllMain, which is not allowed:

The entry-point function should perform only simple initialization or termination tasks. It must not call the LoadLibrary or LoadLibraryEx function (or a function that calls these functions), because this may create dependency loops in the DLL load order. This can result in a DLL being used before the system has executed its initialization code. Similarly, the entry-point function must not call the FreeLibrary function (or a function that calls FreeLibrary) during process termination, because this can result in a DLL being used after the system has executed its termination code.

In another SO question, one user apparently noticed a change since Windows 8 in this regard, which would explain why the error only happens on this version of Windows.

We'll now change our application so that FreeLibrary is called at a different point of time.

Community
  • 1
  • 1
floele
  • 3,668
  • 4
  • 35
  • 51