Why doesn't the same generated assembler code lead to the same output?

Question

Sample code (t0.c):

#include <stdio.h>

float f(float a, float b, float c) __attribute__((noinline));
float f(float a, float b, float c)
{
    return a * c + b * c;
}

int main(void)
{
    void* p = V;
    printf("%a\n", f(4476.0f, 20439.0f, 4915.0f));
    return 0;
}

Invocation & execution (via godbolt.org):

# icc 2021.1.2 on Linux on x86-64
$ icc t0.c -fp-model=fast -O3 -DV=f
0x1.d32322p+26
$ icc t0.c -fp-model=fast -O3 -DV=0
0x1.d32324p+26

Generated assembler code is the same: https://godbolt.org/z/osra5jfYY.

Why doesn't the same generated assembler code lead to the same output?

Why does void* p = f; matter?

What in tarnation... are you sure the executables are identical? What does `diff` say? — Marco Bonelli, Nov 27 '21 at 00:01
I haven't yet compared executables. As I understand, godbolt.org doesn't (yet) allow to download (or compare online) the executables. — pmor, Nov 27 '21 at 00:09
Use the debugger - step over the assembly code. See what registers **really** contain before the call to `printf` depending on the other initialization code you do not see here. Fast math often gives "interesting" results because of that. — 0___________, Nov 27 '21 at 00:16
Yeah, I would wonder whether the assembly shown actually matches the code being executed. For instance, maybe link-time optimization is happening? — Nate Eldredge, Nov 27 '21 at 01:30
Ah, check out when you select "Compile to binary". The `-DV=0` version has reduced `f` to just returning a constant - presumably interprocedural constant propagation, done once the linker can see there are no other calls to `f`. Taking the address of `f` probably fools it. — Nate Eldredge, Nov 27 '21 at 01:34

Nate Eldredge · Accepted Answer · 2021-11-27T06:03:56.063

5

Godbolt shows you the assembly emitted by running the compiler with -S. But in this case, that's not the code that actually gets run, because further optimizations can be done at link time.

Try checking the "Compile to binary" box instead (https://godbolt.org/z/ETznv9qP4), which will actually compile and link the binary and then disassemble it. We see that in your -DV=f version, the code for f is:

 addss  xmm0,xmm1
 mulss  xmm0,xmm2
 ret

just as before. But with -DV=0, we have:

 movss  xmm0,DWORD PTR [rip+0x2d88]
 ret

So f has been converted to a function which simply returns a constant loaded from memory. At link time, the compiler was able to see that f was only ever called with a particular set of constant arguments, and so it could perform interprocedural constant propagation and have f merely return the precomputed result.

Having an additional reference to f evidently defeats this. Probably the compiler or linker sees that f had its address taken, and didn't notice that nothing was ever done with the address. So it assumes that f might be called elsewhere in the program, and therefore it has to emit code that would give the correct result for arbitrary arguments.

As to why the results are different: The precomputation is done strictly, evaluating both a*c and b*c as float and then adding them. So its result of 122457232 is the "right" one by the rules of C, and it is also what you get when compiling with -O0 or -fp-model=strict. The runtime version has been optimized to (a+b)*c, which is actually more accurate because it avoids an extra rounding; it yields 122457224, which is closer to the exact value of 122457225.

edited Nov 27 '21 at 06:03

answered Nov 27 '21 at 01:51

Nate Eldredge

48,811
6
54
82

Constant propagation might have been done with `double`, introducing two separate rounding steps, perhaps? No, that wouldn't explain it; as doubles those operations are all exact since the numbers aren't too large. – Peter Cordes Nov 27 '21 at 04:11
2

@PeterCordes: Sorted it out. The constant-propagated version evaluates `(a*c)+(b*c)` strictly, incurring all the rounding errors in doing all operations as `float`. The runtime version optimizes to `(a+b)*c` which is faster and more accurate but not strictly correct by C evaluation rules. It looks like icc effectively does `-ffast-math` by default. – Nate Eldredge Nov 27 '21 at 04:49
Yes, it does somewhat (default is `-fp-model fast=1`), although the Godbolt link in the question used `-fp-model=fast` (i.e. fast=2) to make it about as aggressive as GCC `-ffast-math`. https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/compiler-options/compiler-option-details/floating-point-options/fp-model-fp.html. I'm not clear on exactly what is/isn't allowed at the default `fast=1`, but FP associative math assumptions are so it can auto-vectorize and look good when people benchmark it. (And distributive) – Peter Cordes Nov 27 '21 at 04:54
2

@PeterCordes: Ah, okay, and `-fp-model=strict` gets us back to the strict version. (Amusingly, before I checked the docs, my first guess was to do `-fp-model fast=0` and that ICEs: https://godbolt.org/z/YW96vYKj9) – Nate Eldredge Nov 27 '21 at 04:58
Re: _but not strictly correct by C evaluation rules_: then `__STDC_IEC_559__` cannot be `1` ([as it is now](https://stackoverflow.com/questions/70115688/why-dont-non-strict-floating-point-models-change-the-value-1-of-stdc-iec-559)). – pmor Nov 29 '21 at 14:56
1

@pmor: Welllll... in the fine print, the ICC manual does say that [`-fp-model=precise` (which `=strict` implies) is required for "strict ANSI conformance"](https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/compiler-options/compiler-option-details/floating-point-options/fp-model-fp.html). As such, `icc` *without* this option is not claiming to be a conforming implementation, and therefore nobody can stop them from defining any macros they like. – Nate Eldredge Nov 29 '21 at 15:18
FYI: MSVC has [the similar scenario](https://learn.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=msvc-170): _`__STDC__` defined as 1 only when compiled as C and if the `/Za` compiler option is specified. Otherwise, undefined._ – pmor Nov 29 '21 at 17:55
Can you answer the prev. mentioned [question](https://stackoverflow.com/questions/70115688/why-dont-non-strict-floating-point-models-change-the-value-1-of-stdc-iec-559): "not claiming ..., and therefore nobody can stop ..."? – pmor Nov 29 '21 at 17:58

Why doesn't the same generated assembler code lead to the same output?

1 Answers1