2

I got this problem on different C projects when using gdb. If I run my program without it, it crashes consistently at a given event probably because of a invalid read of the memory. I try debugging it with gdb but when I do so, the crash seems to never occur !

Any idea why this could happen ?

I'm using mingw toolchain on Windows.

Giann
  • 3,142
  • 3
  • 23
  • 33
  • 4
    Yes, it sounds like a race condition or heap corruption or something else that is usually responsible for Heisenbugs. The problem is that your code is likely not correct at some place, but that the debugger will have to behave even if the debugged application does funny things. Try Valgrind on the application. Since you are using MinGW, chances are that your application will compile in an environment where Valgrind can run. – 0xC0000022L Dec 01 '11 at 09:51
  • @STATUS_ACCESS_DENIED How can he try Valgrind on Windows? – Employed Russian Dec 01 '11 at 17:25
  • @EmployedRussian You can't, but as sayed, compiling a mingw project under Linux is pretty easy if you don't have system specific libs – Giann Dec 01 '11 at 18:22
  • @Giann: Indeed :) ... if this applies to your case I can also make this a formal answer. – 0xC0000022L Dec 01 '11 at 19:54
  • @STATUS_ACCESS_DENIED Please do – Giann Dec 02 '11 at 08:24

4 Answers4

2

Any idea why this could happen ?

There are several usual reasons:

  1. Your application has multiple threads, has a race condition, and running under GDB affects timing in such a way that the crash no longer happens
  2. Your application has a bug that is affected by memory layout (often reading of uninitialized memory), and the layout changes when running under GDB.

One way to approach this is to let the application trap whatever unhandled exception it is being killed by, print a message, and spin forever. Once in that state, you should be able to attach GDB to the process, and debug from there.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
2

Yes, it sounds like a race condition or heap corruption or something else that is usually responsible for Heisenbugs. The problem is that your code is likely not correct at some place, but that the debugger will have to behave even if the debugged application does funny things. This way problems tend to disappear under the debugger. And for race conditions they often won't appear in the first place because some debuggers can only handle one thread at a time and uniformly all debuggers will cause the code to run slower, which may already make race conditions go away.

Try Valgrind on the application. Since you are using MinGW, chances are that your application will compile in an environment where Valgrind can run (even though it doesn't run directly on Windows). I've been using Valgrind for about three years now and it has solved a lot of mysteries quickly. The first thing when I get a crash report on the code I'm working with (which runs on AIX, Solaris, BSDs, Linux, Windows) I'm going to make one test run of the code under Valgrind in x64 and x86 Linux respectively.

Valgrind, and in your particular case its default tool Memcheck, is going to emulate through the code. Whenever you allocate memory it will mark all bytes in that memory as "tainted" until you actually initialize it explicitly. The tainted status of memory bytes will get inherited by memcpy-ing uninitialized memory and will lead to a report from Valgrind as soon as an uninitialized byte is used to make a decision (if, for, while ...). Also, it keeps track of orphaned memory blocks and will report leaks at the end of the run. But that's not all, more tools are part of the Valgrind family and test various aspects of your code, including race conditions between threads (Helgrind, DRD).

Assuming Linux now: make sure that you have all the debug symbols of your supporting libraries installed. Usually those come in the *-debug version of packages or in *-devel. Also, make sure to turn off optimization in your code and include debug symbols. For GCC that's -ggdb -g3 -O0.

Another hint: I've had it that pointer aliasing has caused some grief. Although Valgrind was able to help me track it down, I actually had to do the last step and verify the created code in its disassembly. It turned out that at -O3 the GCC optimizer got ahead of itself and turned a loop copying bytes into a sequence of instructions to copy 8 bytes at once, but assumed alignment. The last part was the problem. The assumption about alignment was wrong. Ever since, we've resorted to building at -O2 - which, as you will see in this Gentoo Wiki article, is not the worst idea. To quote the relevant partÖ

-O3: This is the highest level of optimization possible, and also the riskiest. It will take a longer time to compile your code with this option, and in fact it should not be used system-wide with gcc 4.x. The behavior of gcc has changed significantly since version 3.x. In 3.x, -O3 has been shown to lead to marginally faster execution times over -O2, but this is no longer the case with gcc 4.x. Compiling all your packages with -O3 will result in larger binaries that require more memory, and will significantly increase the odds of compilation failure or unexpected program behavior (including errors). The downsides outweigh the benefits; remember the principle of diminishing returns. Using -O3 is not recommended for gcc 4.x.

Since you are using GCC in MinGW, I reckon this could well apply to your case as well.

0xC0000022L
  • 20,597
  • 9
  • 86
  • 152
1

Although it's a bit late, one can read this question's answer in order to be able to set up a system to catch a coredump without using gdb. He may then load the core file using

gdb <path_to_core_file> <path_to_executable_file>

and then issue

thread apply all bt

in gdb.

This will show stack traces for all threads that were running when the application crashed, and one may be able to locate the last function and the corresponding thread that caused the illegal access.

Community
  • 1
  • 1
Alex C
  • 923
  • 9
  • 23
0

Your application is probably receiving signals and gdb might not pass them on depending on its configuration. You can check this with the info signals or info handle command. It might also help to post a stack trace of the crashed process. The crashed process should generate a core file (if it hasn't been disabled) which can be analyzed with gdb.

steve
  • 5,870
  • 1
  • 21
  • 22