7

Consider this function which I found in this question:

void to_bytes(uint64_t const& x, uint8_t* dest) {
    dest[7] = uint8_t(x >> 8*7);
    dest[6] = uint8_t(x >> 8*6);
    dest[5] = uint8_t(x >> 8*5);
    dest[4] = uint8_t(x >> 8*4);
    dest[3] = uint8_t(x >> 8*3);
    dest[2] = uint8_t(x >> 8*2);
    dest[1] = uint8_t(x >> 8*1);
    dest[0] = uint8_t(x >> 8*0);
}

As x and dest might point to the same memory, the compiler is not allowed to optimize this into a single qword move (each line might change the value of x).

So far so good.

But if you pass x by value instead, this argument does not longer hold. And indeed, GCC optimizes this to a simple mov instruction, as expected: https://godbolt.org/z/iYj1or

However, clang does not: https://godbolt.org/z/Hgg5z9

I'm assuming that, as it is not even guaranteed that x occupies any stack memory at all, any attempt to make dest point to x before the function is called would result in undefined behavior and thus the compiler can assume that this just never happens. That would mean clang is missing some opportunity here. But I'm not sure. Can somebody clarify?

sebrockm
  • 5,733
  • 2
  • 16
  • 39
  • Alignment comes to mind. Not sure if you can use a misaligned qword mov. – nwp May 13 '19 at 12:01
  • @Someprogrammerdude I'm not trying to make the compiler not optimize this. I'm just wondering if there is a valid reason for clang to not optimize it the same way gcc does – sebrockm May 13 '19 at 12:03
  • 1
    `dest` cannot point to the local variable as it didn't exist yet. Maybe report a missed optimization bug – M.M May 13 '19 at 12:06
  • @M.M yes, this was the same thought I had. Will file a report soon – sebrockm May 13 '19 at 12:12
  • Are you asking why clang does not do some/all optimizations that gcc does? Maybe because it is not required to. – Öö Tiib May 13 '19 at 12:19
  • @sebrockm: similar bug for C was already submitted. See: https://bugs.llvm.org/show_bug.cgi?id=39944 – P.W May 13 '19 at 12:24
  • @ÖöTiib I'm asking if Clang is just missing this opportunity or if I am missing some edge case where `dest` could validly point to `x` and thus clang would be absolutely right not to optimize this (which conversely would mean gcc has a bug). – sebrockm May 13 '19 at 12:24
  • For C and pre C++17, `dest[...]` and `x` in `x>>...` are unsequenced, so it would be UB if they `dest[...]` and `x` refered to the same object. So before with pre c++17 the optimization should also happen with x being a reference – Oliv May 13 '19 at 12:30

1 Answers1

2

The code you've given is way overcomplicated. You can replace it with:

void to_bytes(uint64_t x, uint8_t* dest) {
    x = htole64(x);
    std::memcpy(dest, &x, sizeof(x));
}

Yes, this uses the Linux-ism htole64(), but if you're on another platform you can easily reimplement that.

Clang and GCC optimize this perfectly, on both little- and big-endian platforms.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436