Understanding CLR floating-point optimizations

Question

In this question, the discussion is on possible discrepancies between Debug and Release mode when dealing with floating-point operations. In particular, in this answer we see that a seemingly innocent operation,

Single f1 = 0.00000000002f;
Single f2 = 1 / f1;
Double d = f2;
Console.WriteLine(d);

can give rise to different results in x86 Debug and Release mode (where it prints 49999998976 and 50000000199,7901) respectively. From the point of view of the assembly, this happens because the store-then-load pair

fstp        dword ptr [ebp-44h]  
fld         dword ptr [ebp-44h]

has been optimized away in Release mode, where in Debug mode, taking the value from the 80-bit x87 register, moving it to a dword and back again, is enough to kill off precision.

Now, what I'm wondering is if this is in fact in accordance with specification. The C# specification states that

Floating-point operations may be performed with higher precision than the result type of the operation. For example, some hardware architectures support an "extended" or "long double" floating-point type with greater range and precision than the double type, and implicitly perform all floating-point operations using this higher precision type. Only at excessive cost in performance can such hardware architectures be made to perform floating-point operations with less precision, and rather than require an implementation to forfeit both performance and precision, C# allows a higher precision type to be used for all floating-point operations.

However, as I read it, this applies only to individual operations and since in our case we are very clear about how the result of the division should be stored in a Single and only then converted to a Double, given only this piece of specification I would expect the resulting Double to always be representable as a single-precision floating-point. On the other hand, as also spelled out in the accepted answer to the other question, the runtime specification goes on to state that

Storage locations for floating-point numbers (statics, array elements, and fields of classes) are of fixed size. The supported storage sizes are float32 and float64. Everywhere else (on the evaluation stack, as arguments, as return types, and as local variables) floating-point numbers are represented using an internal floating-point type. In each such instance, the nominal type of the variable or expression is either float32or float64, but its value can be represented internally with additional range and/or precision. The size of the internal floating-point representation is implementation-dependent, can vary, and shall have precision at least as great as that of the variable or expression being represented.

So my question becomes whether I'm reading and understanding this correctly:

Is the above discrepancy between Debug and Release mode due to the fact that since we are dealing with local variables, Single should in fact always be read as "Single or more precise"? And that in particular, since if we had used only Double throughout we would be getting a different result, the compiler is allowed to optimize as if all locals were doubles?

I realize that this can be answered with a simple "yes" or "no", but any further insights would be appreciated as well.

Notably, if we change all of f1, f2, and d to be fields of a class, Debug and Release mode results agree, which would support the above interpretation.

My guess would be that in release mode the compiler employs constant folding and simply compiles the equivalent of `float f2 = 1/0.00000000002f` which won't go through a separate store/load before the division. — Joey, Mar 28 '19 at 13:32
This get more complicated than the link. There were errors in the floating point unit on some microprocessors. So some computers have patches to compensate for the errors. And some patches got installed on machines with good FPU. So I suspect the assembly code posted may have the patches or the patches may be in the release code and not in the debug code. People were getting different answers on different PCs. So it is not just a precision difference. — jdweng, Mar 28 '19 at 13:44
@jdweng: Sounds fun. Do you have a reference for further reading on that? — fuglede, Mar 28 '19 at 13:47
See : https://en.wikipedia.org/wiki/Pentium_FDIV_bug and https://hardware.slashdot.org/story/14/10/10/193217/where-intel-processors-fail-at-math-again — jdweng, Mar 28 '19 at 14:13
@Joey: Yep, from the linked answer (whose assembly I should have probably duplicated here), it's clear that that's what ends up making the difference. What I'm wondering is if this optimization is actually in accordance with specification. — fuglede, Mar 29 '19 at 23:03
@HansPassant: That *looks* like a different question. Is the point of the dupe marking that there's some way of making an explicit cast, i.e. what would amount to `Double d = (Single)f2;`, would makes the results consistent? (If so, that doesn't really help me with my confusion about whether or not the C# spec is somehow overruled by the CLR one.) — fuglede, Apr 09 '19 at 18:08
Perhaps interestingly, Visual Studio considers that cast redundant, even if it suffices to ensure consistency. — fuglede, Apr 09 '19 at 18:27

Understanding CLR floating-point optimizations

0 Answers0