Program crashes only in Release mode outside debugger

Question

I have quite massive program (>10k lines of C++ code). It works perfectly in debug mode or in release mode when launched from within Visual Studio, but the release mode binary usually crashes when launched manually from the command line (not always!!!).

The line with delete causes the crash:

bool Save(const short* data, unsigned int width, unsigned int height, 
          const wstring* implicit_path, const wstring* name = NULL, 
          bool enable_overlay = false)
{
    char* buf = new char[17];
    delete [] buf;
}

EDIT: Upon request expanded the example.

The "len" has length 16 in my test case. It doesn't matter, if I do something with the buf or not, it crashes on the delete.

EDIT: The application works fine without the delete [] line, but I suppose it leaks memory then (since the block is never unallocated). The buf in never used after the delete line. It also seems it does not crash with any other type than char. Now I am really confused.

The crash message is very unspecific (typical Windows "xyz.exe has stopped working"). When I click the "Debug the program" option, it enters VS, where the error is specified to be "Access violation writing location xxxxxxxx". It is unable to locate the place of the error though "No symbols were loaded for any stack frame".

I guess it is some pretty serious case of heap corruption, but how to debug this? What should I look for?

Thanks for help.

Check if you are using the correct runtime libraries, use release builds of dependent libraries etc. Difficult to say what the exact reason is. Check if the pointer `but` is not deallocated in some other context (leading to a double free) or if you invoke UB somewhere before reaching the `delete []` call (index out of bounds). — dirkgently, Feb 27 '10 at 14:02
This crashes even without me touching the buf pointer. I just allocate the space and the immediately delete it and it crashes. The buf is not touched after it is deleted. — Matěj Zábský, Feb 27 '10 at 14:09
No. It also does not crash if I comment out the second line. — Matěj Zábský, Feb 27 '10 at 14:15
What compiler / linker are you using? GCC, Visual Studio (2005/2008?) - Depending on the compilier, you will have a few compile time options that may assist you in finding the code that causes your heap corruption. — NTDLS, Feb 27 '10 at 14:21
Only one "oh, mbstowcs is unsafe use mbstowcs_s instead" and these: 1>LINK : warning LNK4224: /OPT:NOWIN98 is no longer supported; ignored 1>ggen.obj : warning LNK4075: ignoring '/EDITANDCONTINUE' due to '/OPT:ICF' specification 1>LINK : /LTCG specified but no code generation required; remove /LTCG from the link command line to improve linker performance — Matěj Zábský, Feb 27 '10 at 15:42

score 11 · Accepted Answer · answered Feb 27 '10 at 14:19

11

have you checked memory leaks elsewhere?

usually weird delete behavior is caused by the heap getting corrupted at one point, then much much later on, it becomes apparent because of another heap usage.

The difference between debug and release can be caused by the way windows allocate the heap in each context. For example in debug, the heap can be very sparse and the corruption doesn't affect anything right away.

answered Feb 27 '10 at 14:19

Eric

19,525
19
84
147

In the end, it was exactly this case. I missed array bounds by ONE in one place and the program was crashing like 5000 lines of code later. – Matěj Zábský Feb 28 '10 at 16:40
1

So how did you find it in the end? – rogerdpack Sep 20 '12 at 23:18
2

@rogerdpack Binary search. Delete/disable pars of the code and gradually home in on the guilty line. – Matěj Zábský Jun 25 '14 at 08:26

score 5 · Answer 2 · edited Sep 20 '12 at 23:20

5

The biggest difference between launched in debugger and launched on its own is that when an application is lunched from the debugger Windows provides a "debug heap", that is filled with the 0xBAADF00D pattern; note that this is not the debug heap provided by the CRT, which instead is filled with the 0xCD pattern (IIRC).

Here is one of the few mentions that Microsoft makes about this feature, and here you can find some links about it.

Also mentioned in that link is "starting a program and attaching to it with a debugger does NOT cause it to use the "special debug heap" to be used."

edited Sep 20 '12 at 23:20

rogerdpack

62,887
36
269
388

answered Feb 27 '10 at 14:17

Matteo Italia

123,740
17
206
299

Now it crashes inside debugger as well,but it is still unable to "Load symbols for any stack frames", so I am unable to debug it effectively. Thanks, at least some progress. – Matěj Zábský Feb 27 '10 at 14:39
Strange, usually it loads the symbols correctly. Try this: launch it without debugging from Visual Studio, then use the "Attach to process" command to connect the VS debugger to your application's process. In this way VS should load correctly the symbols of your application. If the crash happens inside an API call, trace it back to your code using the call stack window; in this case you may get some additional info of what's going on inside the OS installing the Windows debugging symbols. – Matteo Italia Feb 27 '10 at 15:48
I guess the problem is it is the Release build using /MT, it won't crash with /MTd – Matěj Zábský Feb 27 '10 at 16:42
The multi-thread *debug* CRT (/MTd) masks the problem, because, like Windows does with processes spawned by a debugger, it provides to your program a debug heap, that is initialized to the 0xCD pattern. Probably somewhere you use some uninitialized area of memory from the heap as a pointer and you dereference it; with the two debug heaps you get away with it for some reason (maybe because at address 0xbaadf00d and 0xcdcdcdcd there's valid allocated memory), but with the "normal" heap (which is often initialized to 0) you get an access violation, because you dereference a NULL pointer. – Matteo Italia Feb 27 '10 at 17:18

score 2 · Answer 3 · answered Feb 27 '10 at 14:36

You probably have a memory overwrite somewhere and the delete[] is simply the first time it causes a problem. But the overwrite itself can be located in a totally different part of your program. The difficulty is finding the overwrite.

Add the following function

#include <malloc.h>

#define CHKHEAP()  (check_heap(__FILE__, __LINE__))

void check_heap(char *file, int line)
{
    static char *lastOkFile = "here";
    static int lastOkLine = 0;
    static int heapOK = 1;

    if (!heapOK) return;

    if (_heapchk() == _HEAPOK)
    {
        lastOkFile = file;
        lastOkLine = line;
       return;
    }

    heapOK = 0;
    printf("Heap corruption detected at %s (%d)\n", file, line);
    printf("Last OK at %s (%d)\n", lastOkFile, lastOkLine);
}

Now call CHKHEAP() frequently throughout your program and run again. It should show you the source file and line where the heap becomes corrupted and where it was OK for the last time.

This returns OK when called just before the crashing line, so it seems the heap is OK. — Matěj Zábský, Feb 27 '10 at 14:51
This should be the accepted answer. This function is such a great idea! Thank you Leo, you saved my code! — Simeon, Jun 18 '15 at 08:44

score 1 · Answer 4 · answered Feb 27 '10 at 14:22

1

There are many possible causes of crashes. It's always difficult to locate them, especially when they differ from debug to release mode.

On the other hand, since you are using C++, you could get away by using a std::string instead of a manually allocated buffer >> there is a reason for which RAII exists ;)

answered Feb 27 '10 at 14:22

Matthieu M.

287,565
48
449
722

I use std wstring everywhere possible, but in this place I need to pass non-unicode char array to one third party function. – Matěj Zábský Feb 27 '10 at 14:30
Are you sure that the third-party function does not `delete` in some cases ? Also, `std::string` has a `data()` member function which returns a `char*`. – Matthieu M. Feb 28 '10 at 12:39

score 1 · Answer 5 · answered Feb 27 '10 at 15:12

It sounds like you have an unitialised variable somewhere in the code.

In debug mode all the memory is initialised to somthing standard so you will get consistant behavior.

In release mode the memory is not initialised unless you explicitly do somthing.

Run your compiler with the warnings set at the highest level possable.
Then make sure you code compiles with no warnings.

score 0 · Answer 6 · answered Mar 21 '13 at 17:28

One type of problem I had when I observed this symptom is that I had a multi-process program crash on me when run in shell, but ran flawlessly when called from valgrind or gdb. I discovered (much to my embarrassment), that I had a few stray processes of the same program still running in the system, causing a mq_send() call to return with error. The problem was that those stray processes were also assigned the message queue handle by the kernel/system and so the mq_send() in my newly spawned process(es) failed, but undeterministically (per the kernel scheduling circumstances).

Like I said, trivial, but until you find it out, you'll tear your hair out!

I learnt from this hard lesson, and my Makefile these days has all the appropriate commands to create a new build, and cleanup the old environment (including tearing down old message queues and shared memory and semaphores and such). This way, I don't forget to do something and have to get heartburn over a seemingly difficult (but clearly trivially solvable) problem. Here is a cut-and-paste from my latest project:

[Makefile]
all:
      ...
...

obj:
      ...
clean:
      ...
prep:
  @echo "\n!! ATTENTION !!!\n\n"
  @echo "First: Create and mount mqueues onto /dev/mqueue (Change for non ubuntu)"
  rm -rf /run/shm/*Pool /run/shm/sem.*;
  rm -rf /dev/mqueue/Test;
  rm -rf /dev/mqueue/*Task;
  killall multiProcessProject || true;

score 0 · Answer 7 · answered Feb 27 '10 at 14:30

These two are the first two lines in their function.

If you really mean that the way I interpret it, then the first line is declaring a local variable buf in one function, but the delete is deleting some different buf declared outside the second function.

Maybe you should show the two functions.

score 0 · Answer 8 · answered Feb 27 '10 at 14:33

Have you tried simply isolating this with the same build file but code based just on what you've put above? Something like:

int main(int argc, char* argv[] )
{
    const int len( 16 );
    char* buf = new char[len + 1]; 

    delete [] buf;
}

The code you've given is absolutely fine and, on it's own, should run with no problems either in debug or optimised. So if the problem isn't down to specifics of your code, then it must be down to specifics of the project (i.e. compilation / linkage)

Have you tried creating a brand new project and placing the 10K+ lines of C++ into it? Might not take too long to prove the point. Especially if the existing project has either been imported in or heavily altered.

just a thought but have you tried placing some debug output before and after the delete? It seems from what you say that you've identified the delete as the source of the problem but the error seems unclear about where the error actually happens. It may be that the delete itself is fine but something then attempts to access that memory after the delete. It's also generally good practice to set buf to 0 after deleting it to prevent double delete problems and to make it easy to test if the pointer is valid or not. — Component 10, Feb 27 '10 at 20:49

score 0 · Answer 9 · answered Sep 17 '11 at 21:17

I was having the same issue, and I figured out that my program was only crashing when I went to delete[] char pointers with a string length of 1.

void DeleteCharArray(char* array){
 if(strlen(array)>1){delete [] array;}
 else{delete array;}
}

This fixed the issue, but it is still error prone, but could be modified to be otherwise. Anyhow the reason this happens I suspect is that to C++ char* str=new char[1] and char* str=new char; are the same thing, and that means that when you're trying to delete a pointer with delete[] which is made for arrays only then results are unexpected, and often fatal.

I think you checked all the code that executes before the delete for out-of-bounds writing (writing behind end of an array)? These errors are usually result of that. Deleting char arrays with length 1 is just fine. — Matěj Zábský, Sep 18 '11 at 14:58

Program crashes only in Release mode outside debugger

9 Answers9

Linked