4

The Rust compiler & LLVM are sometimes so smart. I used x = x & (x - 1) to clear the lowest significant set bit. It recognized this expression and translated it to the blsr intrinsic and gave me a big speedup. And without me having to use any platform-specific code or explicitly call out the intrinsic.

I want to get it to do the same thing for the bzhi intrinsic, which zeros high bits starting at a bit index position. The canonical expression to do this is src & (1 << inx) - 1 but unfortunately Rust does not recognize it, emitting five instructions instead where one would do. It knows about the instruction, but doesn't recognize the equivalence.

How could I encourage the Rust compiler to emit the bzhi intrinsic without explicitly going to platform-specific code?

Rust 1.66.1, -C opt-level=3 -C target-cpu=native

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    nit: `target-cpu=native` isn't really helpful since we don't know what cpu you have. – cafce25 Jan 20 '23 at 01:31
  • 3
    Hitting the "clear cache + recompile" button got `and sil, 63` / `bzhi`. (Use `-C overflow-checks` and note that there's integer overflow on the shift with inx=64). Maybe when you tried, the AWS instance ran on a system where `target-cpu=native` didn't include BMI2? Or where the tuning settings made it decide not to use it? (Unwise, it's fast enough even on the AMD CPUs you sometimes get, and `-C target-cpu=znver2` does use BZHI). Anyway, generally don't use `native` on Godbolt. – Peter Cordes Jan 20 '23 at 05:09
  • 1
    The Wikipedia expression isn't safe in C either; it compiled to BZHI, but relies on undefined behaviour for inx=64. gcc/clang `-fsanitize=undefined` catch it in a C version. https://godbolt.org/z/4Tr4qhKvK – Peter Cordes Jan 20 '23 at 05:10
  • 1
    So there's still a non-trivial question here, of how to write something that compiles *without* the `and` so we can get all 65 different possible results, not the 64 possible results of a 64-bit shift on x86-64. And which still optimizes to `bzhi`. Perhaps a special case like `inx>63 ? src : shift_and`? – Peter Cordes Jan 20 '23 at 05:42
  • 1
    `if(inx>63) { src } else { ... }` hits the missed-optimization you saw, even when the simpler function (that's unsafe for inx=64) compiles to bzhi. https://godbolt.org/z/3xf4heaWo . GCC compiles a C++ version of that conditional to a branch + bzhi, so that's not great either. https://godbolt.org/z/Y5119ejco . (Even with `unsigned char inx`; BZHI only looks at the low 8 bits of the bit-index operand so compilers have to know that and/or be allowed to assume a smaller value-range. https://www.felixcloutier.com/x86/bzhi) – Peter Cordes Jan 20 '23 at 09:22

0 Answers0