I am working on some floating-point numerical code, what makes me frustrated is the fact that the behaviour of O3 optimized(GCC 4.8.3)code generates very different result from the O2 case(stable one), and it ends up with numerical disaster as expected.
I looked at this thread which may be relevant, but the answer there does't fix my problem. I know that what O3 does in addition to O2 is mainly about inlining and loop unrolling. And I am quite sure the reason is due to the floating-point calculation part, because after I explicitly use O2 optimization for that part the results looks fine.
#pragma GCC push_options
#pragma GCC optimize ("O2")
FP computation code (double precision)
#pragma GCC pop_options
So my question is, what kind of optimizations O3 does could really make a huge difference for floating-point calculation specifically?