I tried: valgrind, _GLIBCXX_DEBUG, -fno-strict-aliasing; how do I debug this error?

Question

I have a really strange error that I've spend several days trying to figure out, and so now I want to see if anybody has any comments to help me understand what's happening.

Some background. I'm working on a software project which involves adding C++ extensions to Python 2.7.1 using Boost 1.45, so all my code is being run through the Python interpreter. Recently, I made a change to the code which broke one of our regression tests. This regression test is probably too sensitive to numerical fluctuations (e.g. different machines), so I should fix that. However, since this regression is breaking on the same machine/compiler that produced the original regression results, I traced the difference in results to this snippet of numerical code (which is verifiably unrelated to the code I changed):

c[3] = 0.25 * (-3 * df[i-1] - 23 * df[i] - 13 * df[i+1] - df[i+2]
               - 12 * f[i-1] - 12 * f[i] + 20 * f[i+1] + 4 * f[i+2]);
printf("%2li %23a : %23a %23a %23a %23a : %23a %23a %23a %23a\n",i,
       c[3],
       df[i-1],df[i],df[i+1],df[i+2],f[i-1],f[i],f[i+1],f[i+2]);

which constructs some numerical tables. Note that:

%a prints provides an exact ascii representation
The left hand side (lhs) is c[3], and the rhs is the other 8 values.
The output below was for values of i that were far from the boundaries of f, df
this code exists within a loop over i, which itself nested several layers (so I'm unable to provide an isolated case to reproduce this).

So I cloned my source tree, and the only difference between the two executables I compile is that the clone includes some extra code which isn't even executed in this test. This makes me suspect that it must be a memory problem, since the only difference should be where the code exists in memory... Anyway, when I run the two executables, here's the difference in what they produce:

diff new.out old.out 
655,656c655,656
<  6  -0x1.7c2a5a75fc046p-10 :                  0x0p+0                  0x0p+0                  0x0p+0   -0x1.75eee7aa9b8ddp-7 :    0x1.304ec13281eccp-4    0x1.304ec13281eccp-4    0x1.304ec13281eccp-4    0x1.1eaea08b55205p-4
<  7   -0x1.a18f0b3a3eb8p-10 :                  0x0p+0                  0x0p+0   -0x1.75eee7aa9b8ddp-7   -0x1.a4acc49fef001p-6 :    0x1.304ec13281eccp-4    0x1.304ec13281eccp-4    0x1.1eaea08b55205p-4    0x1.9f6a9bc4559cdp-5
---
>  6  -0x1.7c2a5a75fc006p-10 :                  0x0p+0                  0x0p+0                  0x0p+0   -0x1.75eee7aa9b8ddp-7 :    0x1.304ec13281eccp-4    0x1.304ec13281eccp-4    0x1.304ec13281eccp-4    0x1.1eaea08b55205p-4
>  7  -0x1.a18f0b3a3ec5cp-10 :                  0x0p+0                  0x0p+0   -0x1.75eee7aa9b8ddp-7   -0x1.a4acc49fef001p-6 :    0x1.304ec13281eccp-4    0x1.304ec13281eccp-4    0x1.1eaea08b55205p-4    0x1.9f6a9bc4559cdp-5
<more output truncated>

You can see that the value in c[3] is subtly different, while none of the rhs values are different. So some how identical input is giving rise to different output. I tried simplifying the rhs expression, but any change I make eliminates the difference. If I print &c[3], then the difference goes away. If I run on two different machines (linux, osx) I have access to, there's no difference. Here's what I've already tried:

valgrind (reported numerous problems in python, but nothing in my code, and nothing that looked serious)
-D_GLIBCXX_DEBUG -D_GLIBCXX_DEBUG_ASSERT -D_GLIBCXX_DEBUG_PEDASSERT -D_GLIBCXX_DEBUG_VERIFY (but nothing asserts)
-fno-strict-aliasing (but I do get aliasing compile warnings out of the boost code)

I tried switching from gcc 4.1.2 to gcc 4.5.2 on the machine that has the problem, and this specific, isolated difference goes away (but the regression still fails, so let's assume that's a different problem).

Is there anything I can do to isolate the problem further? For future reference, is there any way to analyze or understand this kind of problem quicker? For example, given my description of lhs changing even though rhs is not, what would you conclude?

EDIT: The problem was entirely due to -ffast-math.

I'm sorry I sort of got lost: what exactly is the problem? It's not an "error" as in a crash, right? It's that some floating point expression sometimes produces a slightly different result? And what exactly were the two cases that lead to a different result? — Owen, Jul 21 '11 at 02:11
Have you compared the generated code? Maybe it's doing the calculation in a slightly different order giving a different rounding error. — Eelke, Jul 21 '11 at 06:04
Did you diff the source code and review the changes you made to see why might be the cause? — Captain Obvlious, Jul 21 '11 at 10:57
owen -- there's no crash, just a float expression which evaluates to something different, depending on a code change in a separate free function in a separate file. ezpz -- since nothing is crashing, i'm not sure how to use gdb. eelke -- I'm doubtful that gcc is reordering the math... the difference goes away if I, for example, print the address of the variable that's changing. chet -- the differences in the code are unconnected to this output difference in any way except possibly both files get included in the same .cc file. the function that was changed isn't even executed. — amos, Jul 21 '11 at 22:59

osgx · Accepted Answer · 2011-07-22T07:28:47.877

You can change the type of floating-point data of your program. If you use float, you can switch to double; if c,f,df is double, you can switch to long double (80bit on intel; 128 on sparc). For 4.5.2 you can even try to use a _float128 (128bit) software-simulated type.

The rounding error will be less with longer floating-point type.

Why adding some code (even unexecuted) changes the result? The gcc may compile programm differently if the code size changes. There are a lot of heuristics inside the GCC and some heuristics are based on function sizes. So gcc may compile you function in different way.

Also, try to compile your project with flag -mfpmath=sse -msse2 because using x87 (default fpmath for older gcc) is http://gcc.gnu.org/wiki/x87note

by default x87 arithmetic is not true 64/32 bit IEEE

PS: you should not use -ffast-math-like options when you are interested in stable numberic results: http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Optimize-Options.html

-ffast-math Sets -fno-math-errno, -funsafe-math-optimizations, -fno-trapping-math, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans and fcx-limited-range.

This option causes the preprocessor macro FAST_MATH to be defined.

This option should never be turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions.

This part of fast-math may change results

-funsafe-math-optimizations Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE or ANSI standards. When used at link-time, it may include libraries or startup files that change the default FPU control word or other similar optimizations.

This part will hide the traps and NaN-like errors from user (sometime user want to get all traps exactly to debug his code)

-fno-trapping-math Compile code assuming that floating-point operations cannot generate user-visible traps. These traps include division by zero, overflow, underflow, inexact result and invalid operation. This option implies -fno-signaling-nans. Setting this option may allow faster code if one relies on “non-stop” IEEE arithmetic, for example.

This part of fast math says, that compiler can assume a default rounding mode anywhere (which can be false for some programms):

-fno-rounding-math Enable transformations and optimizations that assume default floating point rounding behavior. This is round-to-zero for all floating point to integer conversions, and round-to-nearest for all other arithmetic truncations. ... This option enables constant folding of floating point expressions at compile-time (which may be affected by rounding mode) and arithmetic transformations that are unsafe in the presence of sign-dependent rounding modes.

I don't believe the problem is rounding error because presumably the code will always round an expression in the same way. — amos, Jul 21 '11 at 23:01
@amos: You might want to try it just in case. x87 is sort of odd. [This](http://stackoverflow.com/questions/3206101/extended-80-bit-double-floating-point-in-x87-not-sse2-we-dont-miss-it) SO question seems to have a decent discussion of the issues. Can't hurt to check. — user786653, Jul 21 '11 at 23:22
@user786653 I cleaned and recompiled my code with: `"g++" -ftemplate-depth-128 -O3 -finline-functions -Wno-inline -Wall -g -pthread -fPIC -DTRIAD_MPI -Wno-unknown-pragmas -Wstrict-aliasing=2 -ffast-math -fno-strict-aliasing -mfpmath=sse -msse2 -DBOOST_PYTHON_MAX_ARITY=30 -I"../tools/boost" -I"../tools/python/include/python2.7" -I"/opt/openmpi/include" -c -o "pymodule/bin/gcc-4.1.2/release-triad/threading-multi/FourPlusD_BP.o" "pymodule/FourPlusD_BP.cc"` but differences persist. — amos, Jul 22 '11 at 00:02
@amos: `-ffast-math` might interfere, but I'm not sure. I'd try explicitly setting the fpu control words before doing the calculation to see if and/or look at the generated assembly code. — user786653, Jul 22 '11 at 00:15
@user786653 -- the problem was entirely due to `-ffast-math` (i.e. i took out `-mfpmath=sse -msse2`). I took it out and the differences disappeared. Of course my regression still doesn't match, because the regression was computed with `-ffast-math`, but I think my problem is resolved. Who knew that `-ffast-math`'s optimization depended (for 1 of my 4 compilers) on code size? I'm just happy I can say I learned something from my 3 days of effort! Thanks!!! — amos, Jul 22 '11 at 01:15
@user786653 -- Do you have a link(s) where I can read about about the relationship between `-ffast-math` and code size? — amos, Jul 22 '11 at 02:14
@amos: Check what it enables e.g. [here](http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Optimize-Options.html) and see which switch(es) is/are affecting your results. Your issue is/was probably never code size, but that after changing the code the rounding mode (or whatever) was slightly different this time when calling the function. — user786653, Jul 22 '11 at 02:38
@user786653 -- So I was talking about this with some of my coworkers and we're not sure we're convinced of the explanation that `-ffast-math` was the problem. Since there were other things I could do (like printing &c[3]) to make the differences disappear, **perhaps** `-ffast-math` was affecting the code similarly. — amos, Jul 22 '11 at 19:32
@amos, you should compare the assembler (`-S -fverbose-asm -o asm_out.s`) from different variants (with fast-math and without it; with printf). If you will be able to find the corresponding asm code, you see the difference. The printf() changes AST and code scheduling. Code scheduling and code transformations may depend from fast-math flag. — osgx, Jul 23 '11 at 21:19

I tried: valgrind, _GLIBCXX_DEBUG, -fno-strict-aliasing; how do I debug this error?

1 Answers1