Consider this function which I found in this question:
void to_bytes(uint64_t const& x, uint8_t* dest) {
dest[7] = uint8_t(x >> 8*7);
dest[6] = uint8_t(x >> 8*6);
dest[5] = uint8_t(x >> 8*5);
dest[4] = uint8_t(x >> 8*4);
dest[3] = uint8_t(x >> 8*3);
dest[2] = uint8_t(x >> 8*2);
dest[1] = uint8_t(x >> 8*1);
dest[0] = uint8_t(x >> 8*0);
}
As x
and dest
might point to the same memory, the compiler is not allowed to optimize this into a single qword move (each line might change the value of x
).
So far so good.
But if you pass x
by value instead, this argument does not longer hold.
And indeed, GCC optimizes this to a simple mov
instruction, as expected: https://godbolt.org/z/iYj1or
However, clang does not: https://godbolt.org/z/Hgg5z9
I'm assuming that, as it is not even guaranteed that x
occupies any stack memory at all, any attempt to make dest
point to x
before the function is called would result in undefined behavior and thus the compiler can assume that this just never happens. That would mean clang is missing some opportunity here. But I'm not sure. Can somebody clarify?