74

I have a multithreaded C program, which consistently generates a segmentation fault at a specific point in the program. When I run it with gdb, no fault is shown. Can you think of any reason why the fault might occur only when not using the debugger? It's pretty annoying not being able to use it to find the problem!

Benubird
  • 18,551
  • 27
  • 90
  • 141
  • 6
    This kind of bug is called ["Heisenbug"](http://en.wikipedia.org/wiki/Heisenbug#Heisenbug) and can have many causes. – Sven Marnach Jan 07 '11 at 17:47
  • Does the error happen to be related to window management and/or User32.dll? – user541686 Jan 07 '11 at 17:48
  • 1
    I had an issue like this, my program only crashed with GDB. The issue was an unintialized class member variable was still getting the value 0 when I ran my program, but when I ran it in GDB it had some huge value that segfaulted when I used it as an array index. – GWW Jan 07 '11 at 17:50
  • Not windows related - I'm using Linux 2.6.32-24-generic #43-Ubuntu. – Benubird Jan 07 '11 at 17:53
  • 12
    Have you tried arranging a core dump? Run `ulimit -c unlimited` before you start the program outside the debugger, then `gdb myprogram core` after it dumps core. gdb will then be able to post mortem your segfault. – Robie Basak Jan 07 '11 at 18:03
  • What is happening at that specific point in the program? Are you attempting to deference a pointer? Call another function? What? – John Bode Jan 07 '11 at 19:23

5 Answers5

109

Classic Heisenbug. From Wikipedia:

Time can also be a factor in heisenbugs. Executing a program under control of a debugger can change the execution timing of the program as compared to normal execution. Time-sensitive bugs such as race conditions may not reproduce when the program is slowed down by single-stepping source lines in the debugger. This is particularly true when the behavior involves interaction with an entity not under the control of a debugger, such as when debugging network packet processing between two machines and only one is under debugger control.

The debugger may be changing timing, and hiding a race condition.

On Linux, GDB also disables address space randomization, and your crash may be specific to address space layout. Try (gdb) set disable-randomization off.

Finally, ulimit -c unlimited and post-mortem debugging (already suggested by Robie) may work.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
user541686
  • 205,094
  • 128
  • 528
  • 886
9

Perhaps when using gdb memory is mapped in a location which your over/under flow doesn't trample on memory that causes a crash. Or it could be a race condition that is no longer getting tripped. Although it sounds unintuitive, you should be happy your program was nice enough to crash on you.

Some suggestions

  1. Try a static code analyzer such as the free cppcheck
  2. Try a malloc() debugger like libefence
  3. Try running it through valgrind
SiegeX
  • 135,741
  • 24
  • 144
  • 154
5

By debugging it you are changing the environment that it is running in. It sounds like you are dealing with some sort of race condition, and by debugging it things are scheduled slightly differently so you don't encounter the issue. That, or things are being stored in a slightly different way so it doesn't occur. Are you able to put some debugging output in the code to assist in figuring out the problem? That may have less of an impact and allow you to find your issue.

Mark Loeser
  • 17,657
  • 2
  • 26
  • 34
2

I have totally had this problem before! It was a race condition, and when I was stepping though the code with a debugger the thread i was in was slow enough to not trigger the race condition. Pretty awful.

rook
  • 66,304
  • 38
  • 162
  • 239
1

If you're using gcc, try using the -Wall option to get all warnings. If you use an IDE like Eclipse, it would do that automatically.

Funny Geeks
  • 483
  • 5
  • 12