I haven't yet created a program to see whether GCC will need it passed, When I do I'd like to know how I'd go about enabling strict floating point mode which will allow reproducible results between runs and computers, Thanks.
-
3Note that due to bugs in hardware even forcing strict mode might not give you reproducible results. And those really happen: http://www.cs.earlham.edu/~dusko/cs63/fpu.html, http://lwn.net/Articles/89586/, http://www.reghardware.com/2006/04/28/amd_opteron_fpu_bug/... – liori Sep 03 '11 at 21:11
-
1@liori By "bug", do you mean "offering only higher precision than IEEE 754 double-precision, in a very well known and documented way". The hardware had no bugs. It does not let compilers get the clean IEEE 754 double-precision semantics they want, granted, but that's not a bug, just a misfeature. – Pascal Cuoq Sep 03 '11 at 21:15
-
@liori I must point out that I wrote my comment before you added in the links. Interesting. I withdraw the "the hardware has no bug" part of my comment. – Pascal Cuoq Sep 03 '11 at 21:15
-
@Pascal Cuoq: yes, I actually added the links because I thought someone could think this way :-) – liori Sep 03 '11 at 21:20
-
1Even if there are hardware bugs, I don't see how that supports a "floating point does not give reproducible results" position. It just means some hardware is broken and should not be used. Would you claim the x86 lock prefix does not give reproducible results because of the f00f bug? No, you'd just call cpus that exhibit the bug defective... – R.. GitHub STOP HELPING ICE Sep 03 '11 at 21:30
-
@liori And what about the original Pentium FDIV bug? http://en.wikipedia.org/wiki/Pentium_FDIV_bug – xanatos Sep 03 '11 at 21:30
-
The only way to get reproducible results across different machines is to force software floating point which I'm fairly sure you don't want. – David Heffernan Sep 03 '11 at 22:22
3 Answers
Compiling with -msse2
on an Intel/AMD processor that supports it will get you almost there. Do not let any library put the FPU in FTZ/DNZ mode, and you will be mostly set (processor bugs notwithstanding).
For other architectures, the answer would be different. Those achitectures that do not offer any convenient way to get exact IEEE 754 semantics (for instance, pre-SSE2 IA32 CPUs) would require the use of a floating-point emulation library to get the result you want, at a very high performance penalty.
If your target architecture supports the fmadd
(multiplication and addition without intermediate rounding) instruction, make sure your compiler does not use it when you have explicit multiplications and additions in the source code. GCC is not supposed to do this unless you use the -ffast-math option.

- 79,187
- 7
- 161
- 281
-
Can I find this effect of `-msse2` in the gcc documentation or somewhere else? It is far from obvious for me and in this context what does 'almost there' mean? – highsciguy Nov 23 '12 at 11:12
-
1@highsciguy I said “almost there” because the side-effect of -msse2 is to make it harder, but not impossible, for the compiler to generate non-strictly-IEEE 754 code than to generate strictly IEEE 754 code. There is still the possibility of the compiler working hard to break floating-point semantics. To take a caricatural example, a GCC developers may implement the optimization “replace division by constant into multiplication by reciprocal” in some version, and then later someone may complain and this particular optimization may be moved to the `-ffast-math` option. – Pascal Cuoq Nov 23 '12 at 12:14
-
1@highsciguy To summarize, compiler makers typically do not care enough to put this kind of guarantee in writing, but in the best case they will listen if you report that a change they made broke the strict IEEE 754 semantics. Results may depend on the precise compiler version one is using. Your best bet is to use `-S -msse2` and to read the assembly to check for incorrect transformations. With the historical instruction set, it is somewhat hopeless: some details can be found in http://gcc.gnu.org/ml/gcc-patches/2008-11/msg00105.html – Pascal Cuoq Nov 23 '12 at 12:17
-
1You may also want to look at the value your compiler sets macro `FLT_EVAL_METHOD` to. This macro is set by the compiler. See also `FP_CONTRACT` (set by the programmer). – Pascal Cuoq Nov 23 '12 at 12:33
-
1@PascalCuoq: I wish languages would include different strict and non-strict operators, since many algorithms require a few key steps to be performed with strict semantics, but could tolerate looser semantics in other steps [e.g. because a +/- 1lsb error on one step would get cancelled out in the next step]. If 25% of the steps in an algorithm need to be precise, it would be better to call out those 25% and let the other operations run faster, than have everything run slowly. – supercat Jun 04 '14 at 17:25
-
1@supercat As someone with little interest in floating-point who for the first time last moth wrote a numerically stable algorithm where he actually wished the compiler to replace multiplications and additions with FMAs, I wholeheartedly agree. As far as I know, GCC still does not allow to select one mode or the other at a scale lower than the compilation unit. – Pascal Cuoq Jun 04 '14 at 18:01
-
@PascalCuoq: I found it incredibly disheartening to read that Java decided that the performance of `Math.Sin` should be massively degraded so as to make `Math.Sin(3.1415926535897932384626433832795)` yield a value slightly *greater* than what it had yielded in the previous version. Unless the caller knows that the argument to `sin` has been rounded down by a little bit and is planning to compensate, the "improved" code is slower and *less* accurate than the original. – supercat Jun 04 '14 at 18:13
You can also use GCC
's option -mpc64
on i386 / ia32 target to force double precision computation even on x87 FPU. See GCC manual.
You can also modify the x87 FPU behavor at runtime, see Deterministic cross-platform floating point arithmetics and also An Introduction to GCC.

- 5,277
- 1
- 23
- 39
-
The "deterministic cross-platform" article in particular is great. Thanks! – Peter M Nov 18 '13 at 18:48
If you use -ffloat-store
and always store intermediate values to variables or apply (explicit) casts to the desired type/precision, you should be at least 90% to your goal, and maybe more. I'd welcome comments on whether there are cases this approach still misses. Note that I claim this works even without any SSE options.

- 208,859
- 35
- 376
- 711
-
Two words: "double rounding" http://www.exploringbinary.com/double-rounding-errors-in-floating-point-conversions/ – Pascal Cuoq Sep 03 '11 at 21:31
-
1Nice point. Personally when I need exact floating point behavior I just use `long double` for everything... – R.. GitHub STOP HELPING ICE Sep 03 '11 at 21:43
-
That's a good solution, but then you are no longer portable across computers that do not have 80-bit `long double`s. Now that I think of it, I wonder whether the OP, when he says "computers", really means across architectures, or just across processors within one given architecture (in which case he should be safe). – Pascal Cuoq Sep 03 '11 at 21:54
-
1Another way to avoid double rounding issues is to use a different rounding mode, e.g. `fesetround(FE_TOWARDZERO)`... :-) – R.. GitHub STOP HELPING ICE Sep 04 '11 at 00:07
-
1For what it's worth, I found that `-ffloat-store` with GCC agreed with Microsoft in `/fp:precise` mode. Which is a nice thing. – Peter M Nov 18 '13 at 18:47