How old is your compiler? As I noted in the linked question, the UB-safe variable-count rotate idiom (with extra & masking of the count) confuses old compilers, like gcc before 4.9. Since you're not masking the shift count, it should be recognized with even older gcc.
Your big expression is maybe confusing the compiler. Write an inline function for rotate, and call it, like
value = rotr32(y & mask, 1) ^ rotr32(z & mask, n);
Much more readable, and may help stop the compiler from trying to do things in the wrong order and breaking the idiom before recognizing it as a rotate.
Maybe those are best practices in C++, and not in C?
My answer on the linked question clearly says that it's the best practice for C as well as C++. They are different languages, but they overlap completely for this, according to my testing.
Here's a version of the Godbolt link using -xc
to compile as C, not C++. I had a couple C++isms in the link in the original question, for experimenting with integer types for the rotate count.
Like the original linked from the best-practices answer, it has a version that uses x86 intrinsics if available. clang doesn't seem to provide any in x86intrin.h
, but other compilers have _rotl
/ _rotr
for 32-bit rotates, with other sizes available.
Actually, I talked about rotate intrinsics at length in the answer on the best-practices question, not just in the godbolt link. Did you even read the answer there, apart from the code block? (If you did, your question doesn't reflect it.)
Using intrinsics, or the idiom in your own inline function, is much better than using inline asm. Asm defeats constant-propagation, among other things. Also, compilers can use BMI2 rorx dst, src, imm8
to copy-and-rotate with one instruction, if you compile with -march=haswell
or -mbmi2
. It's a lot harder to write an inline-asm rotate that can use rorx
for immediate-count rotates but ror r32, cl
for variable-count rotates. You could try with _builtin_constant_p()
, but clang evaluates that before inlining, so it's basically useless for meta-programming style choice of which code to use. It works with gcc though. But it's still much better not to use inline asm unless you've exhausted all other avenues (like asking on SO) to avoid it. https://gcc.gnu.org/wiki/DontUseInlineAsm
Fun fact: the rotate functions in gcc's x86intrin.h
are just pure C using the rotate idiom that gcc recognizes. Except for 16-bit rotates, where they use __builtin_ia32_rolhi
.