3

GCC documentation says this about the __builtin_ctz family:

Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined.

Until now, I've been assuming that "undefined" here means "it can return any number, even non-deterministically" but it can't crash for example. In other words, it's not "undefined behavior" in the C++ sense. Comments on this question seem to confirm this.

However, both GCC and clang compile the following code

#include <bit>
#include <cstdint>

int clipped(std::uint64_t a) {
  return __builtin_ctzll(a) & 0x7f;
}

to just

bsf rax, rdi
ret

This can return any value, even though the source code suggests that the return value should be between 0 and 127. I encountered this in actual code where I used the result of this function as lookup index, and got a segmentation fault.

Is this expected or a bug?

Compiler options used: -O3 -march=x86-64 --std=c++2a

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    My understanding / assumption has always been that the result *exists* and has some value of the right type, you just don't know what. Note it says "result is undefined", not "behaviour is undefined". But your test shows that gcc and clang don't agree, or are both buggy. (e.g. that they incorrectly assume the result is still in the right value-range.) – Peter Cordes Jan 19 '22 at 03:26
  • BTW, your code on https://godbolt.org/z/ao9eEP9rh shows that GCC still has a performance bug: with `-march=skylake`, it xor-zeroes RAX to break the non-existent false dependency for TZCNT (fixed in SKL, only exists in Haswell/BDW), but not for BSF which has a *true* dependency because of the possibility of a zero input. Amusingly, without `-march=skylake` we actually get a safe implementation of your function: xor-zero the dst, then `rep bsf` into it (tzcnt, or bsf on older CPUs), so the result is `0` for an input of 0, in range of your masking. – Peter Cordes Jan 19 '22 at 03:31
  • 1
    To provide a bit of context: my original motivation was to have C++ code that compiles to something as fast as possible with `-march=skylake` (just TZCNT) but produces a bounded result (0-127) with any `-march`. – user1020406 Jan 19 '22 at 03:47
  • The reason for `0x7f` and not `0x3f` is that I wanted this to compile to just a single instruction (TZCNT) on skylake and newer processors. TZCNT can return 0-64, so `& 0x3f` is not a no-op, but `& 0x7f` is. – user1020406 Jan 19 '22 at 03:50

0 Answers0