9

I have a small single-threaded C++ application, compiled and linked using Visual Studio 2005, that uses boost (crc, program_options, and tokenizer), a smattering of STL, and assorted other system headers.

(It's primary purpose is to read in a .csv and generate a custom binary .dat and a paired .h declaring structures that "explain" the format of the .dat.)

The tool is crashing (access violation on NULL) when run outside the debugger, only in release. E.g. pressing F5 does not cause the tool to crash, Ctrl-F5 does. When I re-attach the debugger, I get this stack:

ntdll.dll!_RtlAllocateHeap@12()  + 0x26916 bytes    
csv2bin.exe!malloc(unsigned int size=0x00000014)  Line 163 + 0x63 bytes C
csv2bin.exe!operator new(unsigned int size=0x00000014)  Line 59 + 0x8 bytes C++
>csv2bin.exe!Record::addField(const char * string=0x0034aac8)  Line 62 + 0x7 bytes  C++
csv2bin.exe!main(int argc=0x00000007, char * * argv=0x00343998)  Line 253   C++
csv2bin.exe!__tmainCRTStartup()  Line 327 + 0x12 bytes  C

The line it's crashing on is a somewhat innocuous-looking allocation:

pField = new NumberField(this, static_cast<NumberFieldInfo*>(pFieldInfo));

...I don't believe it has reached the constructor yet, it's just allocating memory before jumping to the constructor. It has also executed this code dozens of times by the time it crashes, usually in a consistent (but otherwise non-suspicious) location.

The problem goes away when compiling with /MTd or /MDd (debug runtime), and comes back when using /MT or /MD.

The NULL is loaded from the stack, and I can see it in memory view. _RtlAllocateHeap@12 + 0x26916 bytes seems like a huge offset, like an incorrect jump has been made.

I've tried _HAS_ITERATOR_DEBUGGING in a debug build and that hasn't brought up anything suspicious.

Dropping a HeapValidate at the beginning and end of Record::addField shows an OK heap right up to when it crashes.

This used to work -- I'm not entirely sure what changed between now and the last time we compiled the tool (probably years ago, maybe under an older VS). We've tried an older version of boost (1.36 vs 1.38).

Before dropping back to manual investigation of the code or feeding this to PC-Lint and combing through its output, any suggestions on how to effectively debug this?

[I'll be happy to update the question with more info, if you request info in the comments.]

BIBD
  • 15,107
  • 25
  • 85
  • 137
leander
  • 8,527
  • 1
  • 30
  • 43
  • I once had the joy of semi-reverse engineering the RtlHeap stuff for a similar bug. Don't be confused by the huge offsets - that's normal. The debug symbols (which you seem to be using) are apparently missing some private functions (probably declared "static" in the source file), and getting huge offsets for RtlAllocateHeap or RtlAllocateHeapSlowly just means that was the closest symbol it found. – Chris Walton May 14 '10 at 00:19
  • @arke: yeah, I realized that shortly after posting the question. (Should have gone back to edit it.) Long before this, I had written a lookup tool that parsed Codewarrior-generated xMAP files at work, and ran into the same thing occasionally -- just didn't cross my mind that I wouldn't necessarily have all the symbols here, for some reason. – leander May 14 '10 at 00:29

3 Answers3

12

One little know difference between running with debugger attached or not is the OS Debug Heap (see also Why does my code run slowly when I have debugger attached?). You can turn the debug heap off by using environment variable _NO_DEBUG_HEAP . You can specify this either in your computer properties, or in the Project Settings in Visual Studio.

Once you turn the debug heap off, you should see the same crash even with debugger attached.

That said, be aware memory corruptions can be hard to debug, as often the real cause of the corruption (like some buffer overrun) may be very far from where you see the symptoms (the crash).

Community
  • 1
  • 1
Suma
  • 33,181
  • 16
  • 123
  • 191
  • 1
    +1: Thanks, didn't know about _NO_DEBUG_HEAP. Trying that now. (I've had the fun experience of tracing down memory corruptions that only occurred on retail embedded hardware without a debugger attached, so I hear you on the "may be very far from symptoms" part.) – leander May 01 '09 at 16:10
  • Yep, that did it -- got the crash in the debugger. =) Wish me luck... – leander May 01 '09 at 16:17
  • 1
    Hmm - the debug heap's hiding the corruption? Now that's bad luck... – Michael Burr May 01 '09 at 18:48
  • 1
    @Michael: yeah, it was a one-character buffer overflow. I guess the debug heap didn't exhibit it due to different padding... – leander May 01 '09 at 19:45
3

Crashing inside new or malloc usually is a hint that the (internal) structure of the malloc implementation has been corrupted. This is most of the time done by writing past a previous allocation (buffer overflow). Then on the next call to new or malloc the app crashes as the internal structure now contains invalid data.

Check if you may overwrite any previous allocated space.

If your application is portable you may try to build it on Linux and run it under Valgrind.

lothar
  • 19,853
  • 5
  • 45
  • 59
  • Yeah, that's my guess too. Time to dig out electricfence or dmalloc, especially now that the _NO_DEBUG_HEAP is allowing me to crash inside the debugger. – leander May 01 '09 at 18:25
  • Yeah, I was thinking of porting it to linux just for valgrind earlier! =) The memcheck module is great, I've even used it to debug MMORPG servers in the past. Application Verifier seems to cover a lot of the same bases in Windows, fortunately, glad I found that. – leander May 01 '09 at 20:06
3

Application Verifier was super-useful for solving this once I had _NO_DEBUG_HEAP=1 in environment, see the accepted answer here: Finding where memory was last freed?

It's probably also worth mentioning pageheap, which I found while looking at Application Verifier. Looks like it covers some similar ground.

(FYI, it was a one-character buffer overflow:

m_pEnumName = (char*)malloc(strlen(data) /* missing +1 here */);
strcpy(m_pEnumName, data);

...yet another ridiculously good argument to not use strcpy directly.)

Community
  • 1
  • 1
leander
  • 8,527
  • 1
  • 30
  • 43