4

Currently testing a C# (.Net 4.5) WPF application built on top of a C++ library (managed, I believe, I didn't write it). For various (practical) reasons, it's running on a server (with VS2012 installed, yes, yuck).

The program hooks up to a camera (via the library) and displays the image frames that it receives.

What's weird is that I'm getting buffer overruns (buffer overflows I could understand). And during Garbage Collection!

A buffer overrun has occurred in App.exe which has corrupted the program's internal state.

Various other potentially useful tidbits of information:

  • Upping the 'throughput' makes it happen sooner (seconds instead of minutes)
  • Running in VS (debug or release) stops it happening at all (or at least delays it longer than I'm prepared to wait)
  • There's no unsafe in my C#, and the only 'esoteric' thing I'm doing is converting a bitmap (from the library) into a BitmapSource(like this).
  • The libraries are compiled for x86, the exe too.

Call stack, same every time:

vcr110_clr0400.dll!__crt_debugger_hook ()   Unknown
clr.dll!___raise_securityfailure () Unknown
clr.dll!___report_gsfailure ()  Unknown
clr.dll!CrawlFrame::SetCurGSCookie(unsigned long *) Unknown
clr.dll!StackFrameIterator::Init(class Thread *,class Frame *,struct _REGDISPLAY *,unsigned int)    Unknown
clr.dll!Thread::StackWalkFramesEx(struct _REGDISPLAY *,enum StackWalkAction (*)(class CrawlFrame *,void *),void *,unsigned int,class Frame *)   Unknown
clr.dll!Thread::StackWalkFrames(enum StackWalkAction (*)(class CrawlFrame *,void *),void *,unsigned int,class Frame *)  Unknown
clr.dll!CNameSpace::GcScanRoots(void (*)(class Object * *,struct ScanContext *,unsigned long),int,int,struct ScanContext *,class GCHeap *)  Unknown
clr.dll!WKS::gc_heap::mark_phase(int,int)   Unknown
clr.dll!WKS::gc_heap::gc1(void) Unknown
clr.dll!WKS::gc_heap::garbage_collect(int)  Unknown
clr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int,enum WKS::gc_reason) Unknown
clr.dll!WKS::GCHeap::GarbageCollectTry(int,int,int) Unknown
clr.dll!WKS::GCHeap::GarbageCollect(int,int,int)    Unknown
clr.dll!GCInterface::Collect(int,int)   Unknown
mscorlib.ni.dll!6dcd33e5()  Unknown
[Frames below may be incorrect and/or missing, no symbols loaded for mscorlib.ni.dll]   
mscorlib.ni.dll!6dcd33e5()  Unknown
064afa73()  Unknown
clr.dll!MethodTable::FastBox(void * *)  Unknown
clr.dll!MethodTable::CallFinalizer(class Object *)  Unknown
clr.dll!SVR::CallFinalizer(class Object *)  Unknown
clr.dll!SVR::CallFinalizer(class Object *)  Unknown
clr.dll!SVR::CallFinalizer(class Object *)  Unknown
clr.dll!WKS::GCHeap::FinalizerThreadWorker(void *)  Unknown
clr.dll!Thread::DoExtraWorkForFinalizer(void)   Unknown
clr.dll!Thread::DoExtraWorkForFinalizer(void)   Unknown
clr.dll!Thread::DoExtraWorkForFinalizer(void)   Unknown
clr.dll!WKS::GCHeap::FinalizerThreadStart(void *)   Unknown
clr.dll!Thread::intermediateThreadProc(void *)  Unknown
kernel32.dll!@BaseThreadInitThunk@12 () Unknown
ntdll.dll!___RtlUserThreadStart@8 ()    Unknown
ntdll.dll!__RtlUserThreadStart@8 () Unknown
Community
  • 1
  • 1
Benjol
  • 63,995
  • 54
  • 186
  • 268
  • I bet you've passed an IntPtr to an unmanaged method which is hanging on to it, and the GC is running and moving or freeing the memory to which the IntPtr refers. If so, you probably need to pin the object in question. – Matthew Watson May 08 '13 at 12:48
  • I have put a tentative answer, but I have a couple of questions.. 1) do you have access to the C++ library sources? The bug is likely there.. 2) does the app run OK on pre-windows 7 systems? (server 2008, vista, etc...) – Lorenzo Dematté May 08 '13 at 12:53
  • @dema80, thanks, yes I saw. **1)** not at the moment, but I should soon (I'm keeping the supplier posted, but the main developer is on holiday...) **2)** It's currently running on 2008 R2, but I can't run it on anything older because I'm relying quite heavily on 4.5 (liberal use of `async`). – Benjol May 08 '13 at 13:01
  • I see.. the reason I asked is because the heap in Windows 7/2008 R2 is different from previous versions.. see here http://blogs.technet.com/b/askperf/archive/2009/10/02/windows-7-windows-server-2008-r2-fault-tolerant-heap-and-memory-management.aspx You could try and disable it and see what happens.. but this will only confirm it is a heap corruption issue – Lorenzo Dematté May 08 '13 at 13:08
  • @Benjol as soon as you have the source, I would look for the function(s) where the frame is grabbed and copied to memory, to see if memory is properly pinned before passing a pointer to the unmanaged function that will probably be used to fill the managed memory with the frame(s) – Lorenzo Dematté May 08 '13 at 13:12

2 Answers2

4

Unlike the v2 CLR, the v4 CLR was built with the Microsoft secure CRT extension enabled. Which include checks that, at function exit, the "stack canary" didn't get overwritten. Enabled by the /GS compiler option.

The likely end of your program in the previous version would have been a Fatal Execution Engine Exception, triggered by the access violation that would have been raised when the function tries to return and the return address got corrupted. It now catches the problem sooner. And more reliably, that corrupted return address could by accident point to valid code. What happens next if that's the case is usually truly undiagnosable. And exploitable.

But the root cause is the same, the GC heap getting corrupted.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • Yeah, I'd got as far as the cookie being the symptom, sorry I didn't make that clear in my question. It's the next bit which is harder: given that the real culprit is obviously nowhere in the call stack! – Benjol May 08 '13 at 13:14
  • Well, obviously. Such is the scourge of heap corruption, it is *never* caused by the code that crashes and you can *never* get help for it from a Q+A site. You'll need to find it by taking a good hard look at unsafe code, unit-test the bejeezus out of it. Surely you already know this? – Hans Passant May 08 '13 at 13:28
  • @HansPassant, if I did, I wouldn't have asked. As you can see from the language tag on this question, most of my paddling has been in the shallow end :) – Benjol May 08 '13 at 13:37
  • Well, nvm, I thought I'd seen you answer questions about [interop before](http://stackoverflow.com/a/10896540/17034). This is how that kind of code blows up. – Hans Passant May 08 '13 at 13:48
3

Looks like a memory corruption to me; the library is likely using unsafe and/or unmanaged memory or pinned memory... or maybe it is not pinning the correct bits of memory, or unpinning them too early?

As for:

Running in VS (debug or release) stops it happening at all (or at least delays it longer than I'm prepared to wait)

This is because processes created by a debugger use a different heap (even if you are running in release mode); using this alternate heap is a known source of heisenbugs when dealing with random memory corruption (I have not found many sources on this point however; I thought it was on Raymond Chen blog somewhere but I only found this)

EDIT: reference found! From MSDN:

Processes that the debugger creates (also known as spawned processes) behave slightly differently than processes that the debugger does not create.
Instead of using the standard heap API, processes that the debugger creates use a special debug heap. You can force a spawned process to use the standard heap instead of the debug heap by using the _NO_DEBUG_HEAP environment variable or the -hd command-line option.

My best guess is then: the C++ library corrupts some memory. The GC comes, finds the heap corrupted, crash. OR: the C++ library does forget to pin the memory it is using as a buffer for images. The GC comes, move the memory. The C++ library does not know, writes to a now invalid pointer, causing corruption. The GC comes again, start to work on the now corrupted memory, crash

Lorenzo Dematté
  • 7,638
  • 3
  • 37
  • 77