Is SSE floating-point arithmetic reproducible?

Question

The x87 FPU is notable for using an internal 80-bit precision mode, which often leads to unexpected and unreproducible results across compilers and machines. In my search for reproducible floating-point math on .NET, I discovered that both major implementations of .NET (Microsoft's and Mono) emit SSE instructions rather than x87 in 64-bit mode.

SSE(2) uses strictly 32-bit registers for 32-bit floats, and strictly 64-bit registers for 64-bit floats. Denormals can optionally be flushed to zero by setting the appropriate control word.

It would therefore appear that SSE does not suffer from the precision-related issues of x87, and that the only variable is the denormal behavior, which can be controlled.

Leaving aside the matter of transcendental functions (which are not natively provided by SSE unlike x87), does using SSE guarantee reproducible results across machines and compilers? Could compiler optimizations, for instance, translate into different results? I found some conflicting opinions:

If you have SSE2, use it and live happily ever after. SSE2 supports both 32b and 64b operations and the intermediate results are of the size of the operands. - Yossi Kreinin, http://www.yosefk.com/blog/consistency-how-to-defeat-the-purpose-of-ieee-floating-point.html

...

The SSE2 instructions (...) are fully IEEE754-1985 compliant, and they permit better reproducibility (thanks to the static rounding precision) and portability with other platforms. Muller et aliis, Handbook of Floating-Point Arithmetic - p.107

however:

Also, you can't use SSE or SSE2 for floating point, because it's too under-specified to be deterministic. - John Watte http://www.gamedev.net/topic/499435-floating-point-determinism/#entry4259411

I'm pretty sure that if there are two conflicting opinions on the web you'll get an argument here (and probably at least a 3rd opinion too) — KevinDTimm, Feb 28 '13 at 22:51
@KevinDTimm that doesn't make this question subjective though. SSE is either reproducible or it's not. — Asik, Feb 28 '13 at 22:55
"SSE or SSE2 [is] too under-specified to be deterministic". I do not claim to be an expert on these matters, but this sounds like BS to me. In the link there's talk about library functions for transcendental and of course there could be bugs in those on one platform and not another as indeed there could be (in fact, probably is) in any compiler's optimizer, but that does not say anything about SSE/SSE2 per se. Does he have an example of what he means? — 500 - Internal Server Error, Mar 01 '13 at 00:02
I can't think of any other example where people argue that using *less* bits produces a *better* result. Programmers are hopelessly addicted to getting consistently wrong results. If the result changes when you run the Release build then it is the programmer's problem. If the result is less accurate then it is somebody else's problem. — Hans Passant, Mar 01 '13 at 01:48
@Hans Passant: without predictability, rigorous engineering is impossible. The behavior of high-level language source expressions is unpredictable in the face of compiler optimization when extended-precision is used. When non-extended precision is combined with strict compiler settings, the behavior is predictable. For most programmers most of the time, extended precision is a useful crutch. For experts, it is frequently an extreme inconvenience. — Stephen Canon, Mar 01 '13 at 06:21
@StephenCanon GCC and other compilers have given C99's `FLT_EVAL_METHOD==2` mode a bad name by spilling intermediate computations to a `double` slot on the stack in an uncontrollable way. However, implemented according to this interpretation, it is predictable and independent of optimization (for a given `long double` representation): http://gcc.gnu.org/ml/gcc-patches/2008-11/msg00105.html Summary: all computations are `long double` computations. If intermediate results need to be spilt, they are spilt as `long double`. I am not arguing it isn't still an inconvenience. — Pascal Cuoq, Mar 01 '13 at 12:57
@HansPassant for multiplayer simulations, it matters less what the results are than that they are the same across computers. Scientific computing faces similar challenges. Also, it's not just a matter of a few bits: extended precision means the same computation may give either a real value or Infinity, for instance. — Asik, Mar 01 '13 at 15:15

score 15 · Accepted Answer · answered Mar 01 '13 at 06:08

15

SSE is fully specified*. Muller is an expert in floating point arithmetic; who are you going to trust, him or some guy on a gamedev forum?

(*) there are actually a few exceptions for non-IEEE-754 operations like rsqrtss, where Intel never fully specified the behavior, but that doesn't effect the IEEE-754 basic operations, and more importantly their behavior can't actually change at this point because it would break binary compatibility for too many things, so they're as good as specified.

answered Mar 01 '13 at 06:08

Stephen Canon

103,815
19
183
269

Related: re: getting the compiler to generate equivalent asm from the same source: [Does any floating point-intensive code produce bit-exact results in any x86-based architecture?](//stackoverflow.com/q/27149894). (Also x87 gotchas; that question wasn't SSE specific.) – Peter Cordes Apr 22 '18 at 06:40

score 6 · Answer 2 · answered Mar 01 '13 at 15:43

As Stephen noted, results produced by a given piece of SSE assembly code will be reproducible; you feed the same code the same input and you get the same output at the end. (That is, John Watte's quote is flat-out wrong.)

You threw the word "compilers" in there, though. That's a different ball game entirely. Many compilers are still quite bad at preserving the correctness of floating-point code. (The ATLAS errata page makes mention that clang "fails to produce correct code for some operations.") If you use special functions in your code, you're also, to some extent, at the mercy of whoever implemented your math library.

Is SSE floating-point arithmetic reproducible?

2 Answers2

Linked