The Rust compiler & LLVM are sometimes so smart. I used x = x & (x - 1)
to clear the lowest significant set bit. It recognized this expression and translated it to the blsr
intrinsic and gave me a big speedup. And without me having to use any platform-specific code or explicitly call out the intrinsic.
I want to get it to do the same thing for the bzhi
intrinsic, which zeros high bits starting at a bit index position. The canonical expression to do this is src & (1 << inx) - 1
but unfortunately Rust does not recognize it, emitting five instructions instead where one would do. It knows about the instruction, but doesn't recognize the equivalence.
How could I encourage the Rust compiler to emit the bzhi
intrinsic without explicitly going to platform-specific code?
Rust 1.66.1, -C opt-level=3 -C target-cpu=native