2

I am playing around with asmjit and generating assembly. Thereby I noticed that one can not use 64bit constants for instructions (excluding mov which makes sense).

Because of that, I push 64bit constants to the stack and use them by accessing the stack instead of using the constant as an operand. Different resources say, it is fine to use memory as an operand for the and instruction (e.g., [1], [2]).

However, I noticed that the and instruction does not work as expected. I will give you an example from my code:

mov r14, qword ptr [r15+32]   ; r14 holds a masked pointer now
mov qword ptr [rsp], 281474976710655    ; 0xFFFFFFFFFFFF is the mask for the pointer
and r14, [rsp]                ; Using pointer&mask I want to unmask the pointer

After that and instruction, the value in r14 remains as before:

  • r14 before and: 421609184805440
  • r14 after and: 421609184805440

When using a register instead, everything works like expected:

mov r14, qword ptr [r15+32]   ; r14 holds a masked pointer now
mov r13, 281474976710655      ; 0xFFFFFFFFFFFF is the mask for the pointer
and r14, r13                  ; Using pointer&mask I want to unmask the pointer
  • r14 before and: 421723605418560
  • r14 after and: 140248628707904

Of course, I could use a register instead of accessing the stack, but I would be interested in why this behaves differently.

jagemue
  • 363
  • 4
  • 16
  • 4
    `mov qword ptr [rsp], 281474976710655` is impossible. What did it assemble as, maybe `mov qword ptr [rsp], -1`? – harold Jan 16 '22 at 11:36
  • AsmJit should return an error in that case - there is a specific error code for this. I would recommend using ErrorHandler and Logger - these tools help a lot and make JIT more fun :) – Petr Jan 17 '22 at 16:58

1 Answers1

3

Looks like you didn't check for asmjit errors. The docs say there's a kErrorInvalidImmediate - Invalid immediate (out of bounds on X86 and invalid pattern on ARM).

The only x86-64 instruction that can use a 64-bit immediate is mov-immediate to register, the special no-modrm opcode that gives us 5-byte mov eax, 12345, or 10-byte mov rax, 0x0123456789abcdef, where a REX.W prefix changes that opcode to look for a 64-bit immediate. See https://www.felixcloutier.com/x86/mov / why we can't move a 64-bit immediate value to memory?


Your title is a red herring. It's nothing to do with having an m64 operand for and, it's the constant that's the problem. You can verify that by single-stepping the asm with a debugger and checking both operands before the and, including the one in memory. (It's probably -1 from 0xFFFFFFFF as an immediate for mov m64, sign_extended_imm32, which would explain AND not changing the value in R14).

Also disassembly of the JITed machine code should show you what immediate is actually encoded; again a debugger could provide that as you single-step through it.


Use your temporary register for the constant (like mov r14, 0xFFFFFFFFFFFF), then and reg,mem to load-and-mask.

Or better, if the target machine you're JITint for has BMI1 andn, construct the inverted constant once outside a loop with mov r13, ~0xFFFFFFFFFFFF then inside the loop use andn r14, r13, [r15+32] which does a load+and without destroying the mask, all with one instructions which can decode to a single uop on Intel/AMD CPUs.

Of if you can't reuse a constant register over a loop, maybe mov reg,imm64, then push reg or mov mem,reg and use that in future AND instructions. Or emit some constant data somewhere near enough to reference with a RIP-relative addressing mode, although that takes a bit more code-size at every and instruction. (ModRM + 4 byte rel32, vs. ModRM + SIB + 0 or 1 bytes for data on the stack close to RSP).


BTW, if you're just truncating instead of sign-extending, you're also assuming this is address is in the low half of virtual address space (i.e. user-space). That's fine, though. Fun fact: future x86 CPUs (first Sapphire Rapids) will have an optional feature that OSes can enable to transparently ignore the high bits, except for the MSB: LAM = Linear Address Masking. See Intel's future-extensions manual.

So if this feature is enabled with 48-bit masking for user-space, you can skip the AND masking entirely. (If your code makes sure bit 47 matches bit 63; you might want to keep the top bit unmodified or 0 so your code can take advantage of LAM when available to save instructions).


If you were masking to keep the low 32, you could just mov r14d, [r15+32] to zero-extend the low dword of the value into 64-bit R14. But for keeping the high 48 or 57 bits, you need a mask or BMI2 bzhi with 48 in a register.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thanks a lot for your detailed answer! I am searching for a general approach dealing with 64bit constants in my code (I am writing a small IR that uses asmjit to compile to machine code). I will consider many of your hints for optimizations. In general, I will use temporary registers, thanks! If possible, I will also update the title. – jagemue Jan 16 '22 at 19:27
  • If you are targeting X86 and X86_64 then you are lucky about constants. Almost every instruction would sign extends a 32-bit immediate to a 64-bit immediate and most SIMD instructions have 8-bit immediate. There are some special cases like 64-bit immediate in mov as described in the answer. X86_64 is much easier for writing JITs than ARM, for example, which has much more restrictions. BTW you can check all instructions here: https://asmjit.com/asmgrid/ - there is also the encoding, which makes it easier to imagine how the architecture looks like at machine code level. – Petr Jan 17 '22 at 17:01
  • @Petr: x86 SIMD instructions with immediates are only ever a control operand, like for `pshufd`. There is no x86 SIMD mov-immediate to register like ARM has to easily materialize some simple vector constants, unfortunately. But yeah, if there is an immediate at all, it'll be 8-bit. And yeah ARM has complex immediate encodings so if you want to get the most out of it, agreed you'd have to search if / how a number is encodeable as an immediate, or inverted for `mvn`, before giving up and using a slower way. Yeah x86-64 is simpler but easier that way. – Peter Cordes Jan 17 '22 at 17:05
  • Indeed, that's what I have meant. I think the only SIMD instructions that break the 8-bit immediate rule are insertq and extrq (these use two 8-bit immediates) - but these are exceptions, instructions not really worth exploring as they are pre-AVX and most likely deprecated by now. – Petr Jan 17 '22 at 20:58
  • @Petr: Interesting, I'd never looked at `insertq` / `extrq` from AMD's SSE4a. It's apparently still supported on AMD Zen 1/2/3, unlike XOP which was dropped for Zen, and FMA4 which was dropped for Zen2 (after being only unofficially supported on Zen1). (https://en.wikipedia.org/wiki/SSE4#Supporting_CPUs includes stuff about SSE4a, strangely combining that AMD extension in the same Wiki article as Intel's SSE4.1/4.2) – Peter Cordes Jan 17 '22 at 23:23