If you use -masm=intel
, it activates .intel_syntax noprefix
. Immediates no longer take $
prefixes. (But for addresses, you need OFFSET symbol
). Anyway, don't use the $
.
Obviously if you just wanted a shift, you should do it with C instead of inline asm https://gcc.gnu.org/wiki/DontUseInlineAsm. (You can mask the shift count to avoid UB with shift counts that are too high, like for rotates: Best practices for circular shift (rotate) operations in C++)
But if you want to use it as part of something that needs to be inline asm, then you can do it this way to allow the rotate count to be a variable (in cl
) or constant (immediate) from C. I used a "cJ"
constraint to allow a 0-63
immediate operand (J
), or a register operand in rcx/ecx/cx/cl
(c
constraint). (specifically in cl
, because I cast to (uint8_t)
.
Also, I used a b
modifier to override the size, in case you wanted to use the whole rcx
as a named input for something else before you get to the shift. (See 6.45.2.8 x86 Operand Modifiers in the gcc docs).
See also the inline-assembly tag wiki for some guides.
I used https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Multiple-assembler-dialects-in-asm-templates to let this compile and assemble correctly with AT&T or Intel syntax mode.
On the Godbolt compiler explorer, you can see this works with gcc, but clang doesn't work correctly with -masm=intel
for inline-asm. It still substitutes in %rdi
instead of rdi
and fails to assemble.
static inline uint64_t shr (uint64_t v, unsigned c)
{
// %b[c] is cl even if %[c] is ecx or whatever.
asm ("shr {%b[c],%[v] | %[v],%b[c]}"
: [v] "+r" (v)
: [c] "cJ" ((uint8_t)c)); // the cast gets this to use cl
return v;
}
uint64_t shr_variable(uint64_t v, int c) {
return shr(v, c);
}
mov rax, rdi
mov ecx, esi
shr rax,cl
ret
uint64_t shr_const(uint64_t v) {
return shr(v, 13);
}
mov rax, rdi
shr rax,13
ret
Compare this with pure C, with -march=haswell
:
// can use SHRX with BMI2 available. And can optimize much better
uint64_t shr_variable_purec(uint64_t v, unsigned c) {
//c &= 63; // optional, compiles to zero instructions on x86 because shr and shrx already do this.
return v >> c;
}
shrx rax, rdi, rsi
ret