0

Following the advice of this question I added #include <fenv.h> and feenableexcept(FE_ALL_EXCEPT & ~FE_INEXACT); to my main source and compiled it using g++ -O0 -Wall -Wextra -Werror -g main.cpp -o main.o. All files in the project are compiled this way using make. feenableexcept was only added to main. Things are linked together with the same -O0 and -g. I then run the executable as gdb a.out. The debugger gives me the expected SIGFPE however backtrace doesn't provide the normally useful information. Instead I get a listing like this:

Program received signal SIGFPE, Arithmetic exception.
0x00002aaaabe19c0d in _amd_handle_error () from /lib64/libm.so.6
(gdb) bt                                                        
#0  0x00002aaaabe19c0d in _amd_handle_error () from /lib64/libm.so.6
#1  0x00002aaaabe19cca in _pow_special () from /lib64/libm.so.6     
#2  0x00002aaaabdf6c12 in pow () from /lib64/libm.so.6              
#3  0xbfc971b779361ea1 in ?? ()                                     
#4  0x3fe85bf701f137d7 in ?? ()                                     
#5  0x3fe914100cd71275 in ?? ()                                     
#6  0x00007fffffffa420 in ?? ()                                     
#7  0x00007fffffff9960 in ?? ()                                     
#8  0x3fe7c514ca0e522d in ?? ()                                     
#9  0x0000000000000000 in ?? () 

If I try to look at the frames I get nothing useful. In this case the function pow is used many times in the area of code with the floating point error. Knowing how I got to pow is the missing information here.

Normally when I use -O0 and -g I get functions, files and line numbers associated with the trace. If I set a breakpoint using the same executable and backtrace from the breakpoint I get useful information.

Breakpoint 1, SurfaceModel::ProcessGroups (this=0x7fffffff9f30) at source/SurfaceModel.cpp:398
398             vector<Group>::iterator it;
(gdb) bt
#0  SurfaceModel::ProcessGroups (this=0x7fffffff9f30) at source/SurfaceModel.cpp:398
#1  0x00000000006768e6 in MainLoop (logFile=...) at source/main.cpp:94
#2  0x0000000000676337 in main (argc=1, argv=0x7fffffffbe18) at source/main.cpp:41

I then added a *(int*)0=0; to the code to force a seg fault to see if this was related to signals. I got out useful information there as well.

Program received signal SIGSEGV, Segmentation fault.
0x000000000061f42a in SurfaceModel::ClearGroupHeatRates (this=0x7fffffff9f30) at source/SurfaceModel.cpp:456
456             *(int*)0=0; // force a seg fault
(gdb) bt
#0  0x000000000061f42a in SurfaceModel::ClearGroupHeatRates (this=0x7fffffff9f30) at source/SurfaceModel.cpp:456
#1  0x0000000000676ba5 in MainLoop (logFile=...) at source/pilager.cpp:137
#2  0x0000000000676341 in main (argc=1, argv=0x7fffffffbe18) at source/pilager.cpp:41

This seems to be related only to what I did with floating point control. I am running GDB 7.12 and this was compiled with GCC 5.3.0. Is there a way to preserve the trace information with SIGFPE?

Matt
  • 2,554
  • 2
  • 24
  • 45

1 Answers1

1

Is there a way to preserve the trace information with SIGFPE?

The trace info has ~nothing to do with which signal is being raised, and ~everything to do with the function it is raised in.

Somehow your pow is missing unwind descriptor (which is what GDB uses to unwind the stack).

This often happens with assembly-level implementations (where the developer neglects to put in the appropriate .cfi directives), or when building code with broken compilers.

The broken compiler seems unlikely, and I can't find any recent versions of GLIBC that used assembly to implement pow.

To recover the stack, the following techniques may work:

  1. Use reverse debugger (such as rr) and go backwards from the SIGFPE. This is the best solution, but I doubt rr is available for your (apparently quite old) system.
  2. Count the number of times pow is called before the crash:

    (gdb) break pow (gdb) commands 1 silent cont end (gdb) run # run until SIGFPE (gdb) info break
    You will now know how many times pow was called before the crash.

    Run the program again, ignoring the breakpoint $N-1 times (you'll need to remove commands from the breakpoint first and use GDB ignore 1 $N-1 command). You should now be stopped just before the crash, and since you are still not inside pow, GDB should have no trouble showing you the stack trace.

    This approach only works if your program is deterministic.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362