Does the compiler really optimize to make these two functions the same assembly?

Question

I plugged this into Godbolt and was pleasantly surprised that these two function calls a() and b() are equivalent under anything other than -O0 (using most major compilers):

#include <cmath>

struct A {
    int a,b,c;
    float bar() {
        return sqrt(a + b + c);
    }
};

struct B {
    int a[3];
    float bar() {
        int ret{0};
        for (int i = 0; i<3; ++i) {
            ret += a[i];
        }
        return sqrt(ret);
    }
};

float a() {
    A a{55,67,12};
    return a.bar();
}

float b() {
    B b{55,67,12};
    return b.bar();
}

The Godbolt output is:

a():
        movss   xmm0, DWORD PTR .LC0[rip]
        ret
b():
        movss   xmm0, DWORD PTR .LC0[rip]
        ret
.LC0:
        .long   1094268577

I am no assembly expert, but I'm wondering if this could actually be true, that they are doing identical work. I can't even see where in this assembly there is a call to a sqrt, or what that long "constant" (?) is doing in there.

I dont understand the question. You need not be an assembly expert to see that the assembly for `a()` and `b()` is the same. Are you actually asking *why* the compiler can optimize the two functions to do the same? — 463035818_is_not_an_ai, Oct 18 '21 at 09:05
Compiler computes the result during compilation and uses the computed constant in the assembly. `1094268577` should be bit-wise interpreted as a floating point number (IEEE-754): `1094268577` = `0x413936A1` = `11.5758371353`. Try yourself [here](https://www.h-schmidt.net/FloatConverter/IEEE754.html). — Evg, Oct 18 '21 at 09:08
Short answer: Both functions has the very same observable behavior. — Daniel Langr, Oct 18 '21 at 09:31
It's not going to do a sqrt at run-time when the inlining + constant-propagation can make the input a compile-time constant; it evals sqrt at compile time. — Peter Cordes, Oct 18 '21 at 10:00
a+b+c = a+b+c yes. you didnt use any globals or volatile or anything else to prevent the elimination of dead code, so it eliminated it. — old_timer, Oct 18 '21 at 18:03
If you dont want it to optimize out the sqrt, then do not put the call with the hardcoded values in the same source file. In this case pass the three values into a() and b() and do not include the calls to a() nor b() in the same source file. then compile to object and disassemble. they two should remain the same since it is the same code functionally but now it will do both the addition and the sqrt. — old_timer, Oct 18 '21 at 18:04
@old_timer: terminology nitpick: there's no "dead" code here. All of it is involved in computations (evaluated at compile time) that lead up to the returned value. There are no always-false branches, no unused variables or results, just constant folding. — Peter Cordes, Oct 19 '21 at 04:42
Voting to re-open since the question does not appear to be a duplicate of the other questions linked. To be honest, I'm not sure what any of those questions have to do with this one. — Catskul, Apr 05 '23 at 21:06

score 10 · Accepted Answer · edited Oct 18 '21 at 16:03

10

This function:

float a() {
    A a{55,67,12};
    return a.bar();
}

Has exactly the same observable behavior as this one:

float a() {
    return sqrt(55+67+12);
}

The same is true for b(). Further, sqrt(55+67+12) == sqrt(134) == 11.5758369028.

Binary representation of the IEEE-754 floating point value 11.5758369028 is 01000001001110010011011010100001. And that binary as integer is 1094268577.

The compiler applied the so-called as if rule to replace both functions with assembly that has the exact same observable behavior as the original code: Both functions return a float with value 11.5758369028.

edited Oct 18 '21 at 16:03

ecm

2,583
4
21
29

answered Oct 18 '21 at 09:13

463035818_is_not_an_ai

109,796
11
89
185

ah, thanks! I didn't realize this would be done at compile time. I have changed the code a bit (not posted here) to parameterize my functions, and now I see a bonafide `sqrt` and a lot more assembly, yet the two functions are the same anyway, so I guess loop unrolling and other goodies are still happening. neat to see. – johnbakers Oct 18 '21 at 10:08
@johnbakers: yeah, short loops with a constant iteration count are often convenient in source code, and we wouldn't want compilers to make worse asm just because we didn't manually unroll/peel them. e.g. [Why is a simple loop optimized when the limit is 959 but not 960?](https://stackoverflow.com/q/42159460) is pushing the limit of how large a loop GCC will attempt to fully unroll (aka peel). That case is interesting because after fully unrolling, constant-propagation across iterations can reduce the final result to a constant, rather than 959 copies of the loop body. – Peter Cordes Oct 18 '21 at 21:54
But for more normal cases, like using multiple FP accumulators (or SIMD vectors of accumulators) to hide FP instruction latency in the sum of a float array, it works to write short loops over an array of `__m128 sums[]` instead of manually writing `sum0`, `sum1` etc, and compilers will reliably optimize away the loop. – Peter Cordes Oct 18 '21 at 21:56

Does the compiler really optimize to make these two functions the same assembly?

1 Answers1