2

According to both this reference and this reference, the shr instruction supports shifting by either 1, by the CL registers, and by an immediate value. However, I cannot seem to get the immediate value to work; when I have this code:

#include <stdint.h>

int main() {
  uint64_t v = 15;
  asm ("shr %[v], $0x04\t\n"
       : [v] "+r" (v)
       :
       : "cc"
       );
  return v;
}

I get this error message:

$ gcc -masm=intel foo.c
foo.c: Assembler messages:
foo.c:5: Error: operand size mismatch for `shr'

How can I pass an immediate value to shr (without loading it into CL, which I care about because I'm optimizing for the register pressure bottleneck).

Jason Gross
  • 5,928
  • 1
  • 26
  • 53
  • 2
    The operands are the wrong way round. Also, why are you writing inline assembly? – fuz Sep 12 '17 at 21:46
  • See [Working with Big Numbers Using x86 Instructions](http://x86asm.net/articles/working-with-big-numbers-using-x86-instructions/). – Remy Lebeau Sep 12 '17 at 21:50
  • Minor points: `#include `, and `int main()` is returning `uint64_t`. – Weather Vane Sep 12 '17 at 21:54
  • @WeatherVane: returning an `uint64_t` is not a problem, it gets implicitly converted to `int` on return, so, as long as it doesn't overflow the range of `int` it's all fine. – Matteo Italia Sep 12 '17 at 22:13
  • @MatteoItalia yes all is fine unless... I now regret pointing out a problem which you say isn't a problem. Unless it overflows the range of `int`. – Weather Vane Sep 12 '17 at 22:27
  • @WeatherVane Ah, apparently you can't use `#include` in a `
    ` block (without mangling code?), because it'll treat the library as an html tag
    – Jason Gross Sep 12 '17 at 23:05
  • @fuz The inputs are not the wrong way around; I've passed `-masm=intel`. I've updated the question title to make this more clear. – Jason Gross Sep 12 '17 at 23:06
  • And, @fuz, I'm writing inline assembly because I'm working on a straight-line optimizing C compiler specifically for low-level cryptographic primitives, and the our current benchmarking framework drops in the parts of the crypto operations that we haven't synthesized yet from curve25519-donna by agl, in C. gcc isn't good enough to get the performance we're aiming to match. (It's also really bad at handling carry bits from `_addcarryx_u64` and co.) – Jason Gross Sep 12 '17 at 23:22

2 Answers2

6

You're using guides with Intel assembly syntax. GNU assembly (GAS) uses AT&T syntax which has inverse order of operands. Changing their order seem to be doing fine:

uint64_t v = 0xffff;
asm ("shr $0x04, %[v]\n"
   : [v] "+r" (v)
   :
   : "cc"
   );
printf("%llx", v);        // 0xfff

(you can also replace shr with shrq to make usage of 64-bit operand explicit)

If you still want to use Intel syntax as you do with -masm=intel, you have to drop dollar sign from immediate value:

asm ("shr %[v], 4\n"
     ...)
myaut
  • 11,174
  • 2
  • 30
  • 62
  • The documentation you link says, of `-masm`, "Also affects which dialect is used for basic asm (see Basic Asm) and extended asm (see Extended Asm).". In any case, swapping the order only works if I drop `-masm=intel`, which I can't do unless it's possible to refer to literal registers in AT&T syntax, which I haven't figured out how to do yet (see https://stackoverflow.com/questions/46186592/how-do-i-refer-to-literal-registers-in-gcc-inline-assembly-in-att-syntax). I've updated the question title here to explicitly ask about Intel syntax. – Jason Gross Sep 12 '17 at 23:18
  • @JasonGross: seem to be right ; I've updated my answer. Personally, I'd like to advise not to use Intel syntax to avoid confusion. – myaut Sep 12 '17 at 23:31
  • 2
    Yup, it's usually a bad plan to use `-masm=intel` with inline-asm, but it works if you're consistent. If you do write your asm for it, then you have to require it. @JasonGross: You can write [alternatives](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Multiple-assembler-dialects-in-asm-templates) for ATT/Intel, like `asm("{shr $4,%[v] | shr %[v], 4}\n" : ...);` so your inline-asm works both ways, but that just gets into a huge mess and isn't worth it unless you're writing something for a header file to be used in multiple projects. (And then, https://gcc.gnu.org/wiki/DontUseInlineAsm) – Peter Cordes Sep 12 '17 at 23:56
  • Added an answer with that, and showing how to have the shift-count come from a C variable as an immediate or `cl`. – Peter Cordes Sep 13 '17 at 00:32
2

If you use -masm=intel, it activates .intel_syntax noprefix. Immediates no longer take $ prefixes. (But for addresses, you need OFFSET symbol). Anyway, don't use the $.

Obviously if you just wanted a shift, you should do it with C instead of inline asm https://gcc.gnu.org/wiki/DontUseInlineAsm. (You can mask the shift count to avoid UB with shift counts that are too high, like for rotates: Best practices for circular shift (rotate) operations in C++)


But if you want to use it as part of something that needs to be inline asm, then you can do it this way to allow the rotate count to be a variable (in cl) or constant (immediate) from C. I used a "cJ" constraint to allow a 0-63 immediate operand (J), or a register operand in rcx/ecx/cx/cl (c constraint). (specifically in cl, because I cast to (uint8_t).

Also, I used a b modifier to override the size, in case you wanted to use the whole rcx as a named input for something else before you get to the shift. (See 6.45.2.8 x86 Operand Modifiers in the gcc docs).

See also the tag wiki for some guides.

I used https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Multiple-assembler-dialects-in-asm-templates to let this compile and assemble correctly with AT&T or Intel syntax mode.

On the Godbolt compiler explorer, you can see this works with gcc, but clang doesn't work correctly with -masm=intel for inline-asm. It still substitutes in %rdi instead of rdi and fails to assemble.

static inline uint64_t shr (uint64_t v, unsigned c)
{
    // %b[c] is cl even if %[c] is ecx or whatever.
    asm ("shr  {%b[c],%[v] | %[v],%b[c]}"
         : [v] "+r" (v) 
         : [c] "cJ" ((uint8_t)c));  // the cast gets this to use cl
    return v;
}

uint64_t shr_variable(uint64_t v, int c) {
    return shr(v, c);
}

    mov     rax, rdi
    mov     ecx, esi
    shr   rax,cl
    ret


uint64_t shr_const(uint64_t v) {
    return shr(v, 13);
}

    mov     rax, rdi
    shr   rax,13
    ret

Compare this with pure C, with -march=haswell:

// can use SHRX with BMI2 available.  And can optimize much better
uint64_t shr_variable_purec(uint64_t v, unsigned c) {
    //c &= 63;  // optional, compiles to zero instructions on x86 because shr and shrx already do this.
    return v >> c;
}

    shrx    rax, rdi, rsi
    ret
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847