What kind of GCC optimizations may change a double based on whether it is printed or not?

Question

I am debugging code that implements an algorithm whose main loop terminates when a statement à la s >= u || s <= l is true, where s, u and l are doubles that are updated in the main loop. In this example, all three variables are always between 0.5 and 1.5. I am not including the code here, as it is not written by me and extracting a MWE is hard. I am puzzled by the code behaving differently on different architectures, and I'm hoping the clues below can help me narrow down the error in the algorithm.

Some floating point rounding seems to be the root cause of the bug. Here is what I have ascertained so far:

The algorithm terminates correctly on all optimization levels on x86-64.
The algorithm terminates correctly with -O3 (other opt levels were not tried) on arm64, mips64 and ppc64.
The algorithm terminates correctly with -O0 on i686.
The algorithm loops indefinitely with -O1, -O2 and -O3 on i686.
Main point of question: In the cases when the algorithm loops indefinitely, it can be made to terminate correctly if s is printed (std::cout << s << std::endl) before it is compared to l and u.

What kind of compiler optimizations could be relevant here?

All behaviors above were observed on a GNU/Linux system and reproduced with GCC 6.4, 7.3 and 8.1.

If changing just the optimisation level is affecting the program suspect undefined behaviour 1st. — Richard Critten, Jun 11 '18 at 11:18
"What kind of compiler optimizations could be relevant here?" -- Almost certainly values being kept in registers (with higher precision) vs. in memory. — , Jun 11 '18 at 11:18
@RichardCritten In this case, I doubt that there is undefined rather than just unspecified behaviour. — , Jun 11 '18 at 11:19
@Ron: Well, sadly it is rather hard to extract. The code the updates `s`, `l` and `u` is a bit complex and I don't have a full overview of its workings. That's why I'm focusing on the fact that printing changes the outcome. — gspr, Jun 11 '18 at 11:19
Wow, that's some quick downvotes. I'm happy to delete the question if people prefer. I am aware that MWEs are preferable, but I thought the printing observation would make the question concrete enough. Sorry. — gspr, Jun 11 '18 at 11:21
Compiler optimizations are not allowed to change the *observable behavior* of a program, or they would break the "as-if" rule. Your code likely has undefined behavior. — Cory Kramer, Jun 11 '18 at 11:22
@CoryKramer Compiler optimisations *are* allowed to change the observable behaviour if the behaviour is unspecified. — , Jun 11 '18 at 11:22
@gspr In case you have not found them, here are two excellent articles on the topic: [1](https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/), [2](https://yosefk.com/blog/consistency-how-to-defeat-the-purpose-of-ieee-floating-point.html). I hope they will help you, as this comment section tried hard to be unhelpful. — Max Langhof, Jun 11 '18 at 11:45

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

Since you say your code works as intended on x86-64 and other instruction sets, but breaks on i686, but only with some optimisation levels, the likely culprit is x86 extended precision.

On x86, floating point instructions store results in registers with greater precision than when those values are subsequently stored in memory. Therefore, when the compiler can re-use the same value already loaded in a register, the results may be different compared to when it has to save and re-load the value. Printing a value may require saving and re-loading it.

This is a well-known non-bug in GCC.

GCC provides a -ffloat-store command-line option which may be of help:

-ffloat-store

Do not store floating-point variables in registers, and inhibit other options that might change whether a floating-point value is taken from a register or memory.

This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have. Similarly for the x86 architecture. For most programs, the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use -ffloat-store for such programs, after modifying them to store all pertinent intermediate computations into variables.

As mentioned there though, it doesn't automatically let your code work the same as on other instruction sets. You may need to modify your code to explicitly store results in variables.

What kind of GCC optimizations may change a double based on whether it is printed or not?

1 Answers1