5

My program uses a third part library that throws segmentation fault at some point. I tried to compile the library with debug symbols and without compiler optimization, and the crash gone away. My suspect is that compiler optimizations revealed this bug. What are best practices for debugging cases like this?

EDIT - (corrected the statement above: "revealed" instead of "caused")

I think I was misunderstood. I didn't have an intention to blame compiler, or something like that. I only asked for best practices for finding a bug in such a situation, where I don't have debug symbols in the 3rd party library (the crash backtrace leads to the 3rd party library).

Steve Townsend
  • 53,498
  • 9
  • 91
  • 140
Michael Spector
  • 36,723
  • 6
  • 60
  • 88
  • 9
    Bad assumption - typically >99% of the time the problem will be a latent bug in your code which only shows up when optimisation is enabled. Compiler bugs are relatively rare in comparison to bugs in user code. – Paul R Jun 23 '11 at 14:07
  • Why do you specifically believe this is compiler optimization? A floating pointer error or other memory access problem, for example, is likely to manifest itself in highly different ways when you change the flags you use to compile/link the program. – Mike Ryan Jun 23 '11 at 14:08
  • 2
    The error is **NOT** in the third party library. If I make a program and it crashes when I `printf`, my first thought is not that `printf` has an error. – pmg Jun 23 '11 at 14:10
  • If you're using C or Objective-C, it's worth using the [LLVM/Clang's static analyzer](http://clang-analyzer.llvm.org/) to uncover undefined behavior such as using uninitialized variables and the like. – DarkDust Jun 23 '11 at 14:21
  • 1
    @Paul spektom said the bug was *in* a C program and caused *by* a compiler optimization... I think. – Ben Jun 23 '11 at 14:21
  • @Ben: it appears that the question has now been edited in response to the above comments – Paul R Jun 23 '11 at 15:13
  • @pmg: I have found real bugs in `printf` before, SCO's `printf` but a real vendor supplied `printf` nonetheless. Yes, libc bugs are pretty rare but they do happen. – mu is too short Jun 23 '11 at 15:52
  • 3
    You have tagged the question [gdb]. Did you only do a debug build and the crash is gone, or is the crash gone _when running in gdb_? That is an important difference. If the crash is gone _when running in gdb_ then chances are high that it's a non-initialized pointer. The debugger initializes everything including pointers to zero, which causes this bug to "magically disappear" because it's caught in one of the common `if(ptr != 0)` clauses. – Damon Jun 23 '11 at 18:07

7 Answers7

8

Your suspicion is that optimization caused a bug. My suspicion is that your code has constructs that lead to Undefined Behavior, and when the optimizer is on, this Undefined Behavior manifests itself as erroneous behavior or crash. Don't blame the optimizer. Find UB in your code... might be tricky, though. Possible culprits:

  • OutOfBounds index
  • Returning the address a temprorary
  • A zillion of other things
Armen Tsirunyan
  • 130,161
  • 59
  • 324
  • 434
  • 2
    It always baffles me when new programmers think they uncovered a bug in a compiler that's been constantly improved for over 15 years with their 3 `printf` statements... – Blindy Jun 23 '11 at 14:12
  • @spektom: The **best** practice is to write safe code with defined behavior in the first place. I can't give you an algorithm of finding errors in your code, sorry – Armen Tsirunyan Jun 23 '11 at 14:36
  • Best practices is not an algorithm. Best practices can include: tools, compilation settings, etc. Even suggesting "put a printf" statement is a best practice of some sort. – Michael Spector Jun 23 '11 at 14:38
  • @spektom: Look for all array indices, make sure they are safe. Look at all functins that return pointers and refs. Make sure they're safe. Can't think of anything else right now – Armen Tsirunyan Jun 23 '11 at 14:41
8

What you describe is quite common. And it's almost never ever a bug in the compiler optimization. Optimization does a lot of things to your code. Variables get reordered/optimized away etc. If you have one buffer overflow, it might just overflow memory that's no big deal in the debug build, but that memory is very important in the optimization build.

Use valgrind to track down memory errors - they're almost always the cause of the symptoms you see.

nos
  • 223,662
  • 58
  • 417
  • 506
  • +1 for valgrind. Use it before going crazy debugging by hand. Also if the code is segfaulting, executing the application from within GDB should give a reference to where in the code the segfault is occuring. Even without debugging symbols you can deduce the location from the function + assembly offset. – gravitron Jun 23 '11 at 15:47
4

Compile with debug symbols and compiler optimization, it will "hopefully" fail as well. Allow the system to generate a core file (ulimit -c unlimited, then re-run the program). Load the core file into gdb to see what happened.

Another powerful tool is valgrind, run your program within valgrind with the option --db-attatch=yes it will stop and run the debugger as soon as it detects an invalid read or write. Invalid reads/writes are likely to provoke Segfault, and even if they don't, they should be removed anyway.

Good luck,

Ben
  • 7,372
  • 8
  • 38
  • 46
  • Yes - debugging symbols and optimisation are not mutually exclusive (although debugging optimised builds can still be a challenge). The presence or absence of debugging symbols does *not* change the code generation. – caf Jun 24 '11 at 06:07
2

Keep putting debug statements or messageboxes in the place you think the code is crashing. The crash will occur between two messageboxes and this will help you locate the faulty code as long as the code wasn't changed too much.

Also comment out blocks of code until the crash stops coming. Keep commenting back in until the crash returns. What you last commented back in must be causing the crash, directly or indirectly.

Both of these methods are useful for general debugging and half your work is already done if you are able to reliably reproduce the crash.

I did not give specific advice for debugging compiler optimisations because it's highly unlikely the crash is caused by that. The optimisations are generally tested very robustly to ensure they do not change the function or semantics of the code in any way.

Mike Kwan
  • 24,123
  • 12
  • 63
  • 96
2

If the backtrace leads to the third-party library, use gdb to break before the library call. Verify that the parameters you're passing to the library are valid (i.e., aren't uninitialized pointers, aren't pointers to free'd memory, aren't out of range, etc.)

Can you use strace to trace the function calls and then try to determine the execution path in the third-party library? Use a printf or some other system call before the failing library call so you have a starting point in the strace output.

If you really think it's a bug in the third-party library, you'll have to compile it with optimizations on so you can reproduce the failure. Are you saying that your compiler can only include debug symbols for non-optimized builds? gdb should still work for optimized builds.

tomlogic
  • 11,489
  • 3
  • 33
  • 59
0

Well, going through the compiled binary isn't going to help.

So that leaves going through your code to find out what part is causing the segfault. I would just work through your code manually and start commenting things out. Once you find what's causing the error, then you can determine what to do with it. It might be worth adding printfs in select locations to see exactly where the program fails.

Think of it as doing a binary search for the error ;)

tskuzzy
  • 35,812
  • 14
  • 73
  • 140
  • At my work we ask a similar question on interviews and your answer is common, but considered a warning flag. It's not that your solution wouldn't eventually work, but there are much quicker ways to find a segfault (valgrind and execution within a debugger). – gravitron Jun 23 '11 at 15:49
  • It certainly depends on the size and complexity of your program and how good you are at debugging things manually. To me, it's the difference between using an abacus and a calculator. The calculator user will always get you the right answer pretty quickly but will be beat by an experienced abacus user 99% of the time :P – tskuzzy Jun 23 '11 at 16:07
  • Have you debugged a segfault with gdb? gdb myprog; run; ; bt <-- it prints the line of your error. There is no possible way commenting/printf solution would be quicker. – gravitron Jun 23 '11 at 16:49
0

If it only blows up when you turn on optimization, then that's a strong hint you've invoked undefined behavior somewhere. Unfortunately, that UB may be nowhere near the code that actually generated the segfault (as I've discovered several times in the past).

Every time this has happened to me (which hasn't been that often), the cause was a buffer overflow somewhere else in the code. I never developed a repeatable, generally applicable technique for finding the problem, though (unless you want to call hours stepping through a debugger and swearing a generally applicable technique).

John Bode
  • 119,563
  • 19
  • 122
  • 198