8

Short example:

#include <iostream>
#include <string_view>
#include <iomanip>

#define PRINTVAR(x) printVar(#x, (x) )

void printVar( const std::string_view name, const float value )
{
    std::cout 
        << std::setw( 16 )
        << name 
        << " = " << std::setw( 12 ) 
        << value << std::endl;
}

int main()
{
    std::cout << std::hexfloat;
    
    const float x = []() -> float
    {
        std::string str;
        std::cin >> str; //to avoid 
                         //trivial optimization
        return strtof(str.c_str(), nullptr);
    }();

    const float a = 0x1.bac178p-5;
    const float b = 0x1.bb7276p-5;

    const float x_1 = (1 - x);

    PRINTVAR( x );
    PRINTVAR( x_1 );
    PRINTVAR( a );
    PRINTVAR( b );

    PRINTVAR( a * x_1 + b * x );
    return 0;
}

this code on godbolt

This code produces different output on different platforms/compilers/optimizations:

X = 0x1.bafb7cp-5 //this is float in the std::hexfloat notation
Y = 0x1.bafb7ep-5

The input value is always the same: 0x1.4fab12p-2

compiler optimization x86_64 aarch64
GCC-12.2 -O0 X X
GCC-12.2 -O2 X Y
Clang-14 -O0 X Y
Clang-14 -O2 X Y

As we can see, Clang gives us identical results between -O0 and -O2 within same architecture, but GCC does not.

The question is - should we expect the identical result with -O0 and -O2 on the same platform?

gbg
  • 197
  • 3
  • 11
  • I expect that if you use `static_cast(a) * static_cast(x_1) + static_cast(b) * static_cast(x)` as the last expression instead while leaving _everything_ else the same, you will get the same result in all cases? (I can't run it myself on compiler explorer for ARM.) – user17732522 Oct 21 '22 at 08:41
  • No. The optimizer is allowed to reorder expressions and this reordering can cause differences. – Richard Critten Oct 21 '22 at 08:42
  • 4
    @RichardCritten No, standard-conforming (meaning without a flag like `-Ofast`) the compiler may not perform reordering that can affect the value. – user17732522 Oct 21 '22 at 08:42
  • 2
    There was a time when the C standard said little about floating point. Does the latest C standard require IEEE conformance? [This](https://gcc.gnu.org/wiki/FloatingPointMath) may be useful. –  Oct 21 '22 at 08:46
  • IIRC precision of intermediate floating point type is not specified. You can get different results depending on if compiler leaves temporary value in 80-bit FP register or copy it to 64-bit one. `sin(x) == sin(x)` can and did return `false` even for valid values of x. – Revolver_Ocelot Oct 21 '22 at 08:50
  • 1
    @DanielMGessel There was a time when the C standard said little about iostream, std::cout and []() also. Oh wait, that was today. :) – Kaz Oct 21 '22 at 08:50
  • 1
    @user17732522 "The values of the floating-point operands and the results of floating-point expressions may be represented in greater precision and range than that required by the type". This effectively gives the license to reorder. – n. m. could be an AI Oct 21 '22 at 08:52
  • @Kaz The C++ standard says even less about floating-point arithmetic. If we don't reference C at least, then there is basically nothing specified about the operators, except for the exception on representable values in expressions mentioned in the comment above yours. – user17732522 Oct 21 '22 at 08:53
  • @Kaz Fair point - lol. I saw "gcc" and think C, my brain still needs "g++". But I think the same applies to C++. And order of evaluation is pretty loose... I just assume you need an option to control it. –  Oct 21 '22 at 08:54
  • @n.1.8e9-where's-my-sharem. I see the point. But this applies only if there are subexpressions with intermediate results (as here the case) and I guess the requirement for "greater precision and range" also limits this to some degree. – user17732522 Oct 21 '22 at 08:55
  • For reference on some of the comments above, see https://timsong-cpp.github.io/cppwp/n4868/expr#pre-6. – user17732522 Oct 21 '22 at 08:57
  • 2
    Regarding `Should -O0 and -O2 produce identical results?` [gcc doc mentioned above](https://gcc.gnu.org/wiki/FloatingPointMath) says it can differ at least when built-ins are involved - `However, as library functions may not be correctly rounded for all arguments, function results may differ depending on whether they are evaluated at compile time or at run time.` – dewaffled Oct 21 '22 at 09:00
  • 1
    Also, even with `-frounding-math` documentation [says](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html): `-frounding-math - ... This option disables constant folding of floating-point expressions at compile time (which may be affected by rounding mode) and arithmetic transformations that are unsafe in the presence of sign-dependent rounding modes.... This option is experimental and does not currently guarantee to disable all GCC optimizations that are affected by rounding mode.` - there is no 100% guarantee. – dewaffled Oct 21 '22 at 09:07
  • @user17732522 Yes the license is limited of course. – n. m. could be an AI Oct 21 '22 at 09:10
  • It seems like HW designers like FMAs. I remember problems when they were enabled on PPC, then later on GPUs. IIRC, where I worked it took the compiler team a while to convince the HW architects that the more accurate answer the HW gave meant we couldn't always use all those flops... –  Oct 21 '22 at 09:21
  • @user17732522, one can reproduce it on x86 as well by enabling FMA-instruction. Online demo: https://godbolt.org/z/Tqso5nf58 – Fedor Oct 21 '22 at 11:49
  • Note that even extremely subtle changes to the way the code is compiled can change the result, such as [((a+b)+c) giving a different result to (a+(b+c))](https://stackoverflow.com/questions/10371857/is-floating-point-addition-and-multiplication-associative). It's unlikely a C++ compiler would preserve details like that, amidst all the other optimizations it's doing. – Boann Oct 21 '22 at 13:53
  • 2
    @Boann Floating-point addition is famously non-associative. A "detail" like that must be preserved, otherwise the language is unsuitable for numeric computing. For that matter, if the operands are integers, the addition also cannot be rearranged, unless the type has reversible overflow (e.g. two's complement wraparound). – Kaz Oct 21 '22 at 16:02

1 Answers1

7

The question is - should we expect the identical result with -O0 and -O2 on the same platform?

No, not in general.

C++ 2020 draft N4849 7.1 [expr.pre] 6 says:

The values of the floating-point operands and the results of floating-point expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.51

Footnote 51 says:

The cast and assignment operators must still perform their specific conversions as described in 7.6.1.3, 7.6.3, 7.6.1.8 and 7.6.19.

This means that while evaluating a * x_1 + b * x, the C++ implementation may use the nominal float type of the operands or it may use any “superset” format with greater precision and/or range. That could be double or long double or an unnamed format. Once the evaluation is complete, and the result is assigned to a variable (including, in your example, a function parameter), the result calculated with extended precision must be converted to a value representable in the float type. So you will always see a float result, but it may be a different result than if the arithmetic were performed entirely with the float type.

The C++ standard does not require the C++ implementation to make the same choice about what precision it uses in all instances. Even if it did, each combination of command-line switches to the compiler may be regarded a different C++ implementation (at least for the switches that may affect program behavior). Thus the C++ implementation obtained with -O0 may use float arithmetic throughout while the C++ implementation obtained with -O2 may use extended precision.

Note that the extended precision used to calculate may be obtained not just through the use of machine instructions for a wider type, such as instructions that operate on double values rather than float values, but may arise through instructions such as a fused multiply-add, which computes a*b+c as if ab+c were computed with infinite precision and then rounded to the nominal type. This avoids the rounding error that would occur if a*b were computed first, producing a float result, and then added to c.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • 1
    Note that C specifies it in a nicer way, the programmer can check the `FLT_EVAL_METHOD` macro, e.g. if it has value 2, then intermediate values are `long double`; FMA contraction can be disabled via the `STDC FP_CONTRACT` pragma (in theory) or on the compiler command line (in practice). – amonakov Oct 21 '22 at 16:41