wildly different behaviour between O2 and O3 optimized FP code

Question

I am working on some floating-point numerical code, what makes me frustrated is the fact that the behaviour of O3 optimized(GCC 4.8.3)code generates very different result from the O2 case(stable one), and it ends up with numerical disaster as expected.

I looked at this thread which may be relevant, but the answer there does't fix my problem. I know that what O3 does in addition to O2 is mainly about inlining and loop unrolling. And I am quite sure the reason is due to the floating-point calculation part, because after I explicitly use O2 optimization for that part the results looks fine.

#pragma GCC push_options
#pragma GCC optimize ("O2")

FP computation code (double precision)

#pragma GCC pop_options

So my question is, what kind of optimizations O3 does could really make a huge difference for floating-point calculation specifically?

It makes use of undefined behavior? Do have some more details? — JVApen, Aug 17 '16 at 19:47
If `-fno-fast-math` doesn't make the problem go away, post an example of the code that is affected, please. (I don't *think* `-O3` turns on `-ffast-math`, but I could be wrong. `-ffast-math` is a euphemism; it enables a broad spectrum of *incorrect* optimizations on floating-point code.) — zwol, Aug 17 '16 at 19:47
Also, if you are using a 32-bit x86 for your code, try `-mfpmath=sse`. (This is the default for x86-64.) — zwol, Aug 17 '16 at 19:49
Right, and it also does stuff that's valid for mathematical real numbers but incorrect for floating-point in general (not just incorrect per the letter of IEEE 754). Which is why I'm telling you to make sure it's turned off. — zwol, Aug 17 '16 at 19:50
Post your code. `-O3` does **not** include unsafe math optimizations. `-Ofast` does. — EOF, Aug 17 '16 at 19:52
If you have specific requirements, write code using tools that are guaranteed to meet them. — David Schwartz, Aug 17 '16 at 20:22

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

From GCC manual:

-O3

Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-vectorize, -fvect-cost-model, -ftree-partial-pre and -fipa-cp-clone options.

No of these optimizations are particularly unsafe. The only optimization that I see can change the result is -ftree-vectorize. In some cases, using vector instructions can change the result compared to FPU instructions. For example, FPU by default uses 80-bit internal precision for doubles, while vector SIMD instructions use 64 bits. Also the implementation of some math functions (like sqrt) may be different.

You would get much better chance of getting help, if you posted your code, exact compiler flags and information about your hardware (which SIMD instructions does your CPU have).

You can also directly compare assembly code generated in these two cases.

PS. But in my experience, the most likely cause is undefined behavior in the program. Typically, uninitialized variable, division by zero, etc. Make sure you compile with high warnings level (-Wall -Wextra -Wpedantic), and use UB Sanitizer.

wildly different behaviour between O2 and O3 optimized FP code

1 Answers1