8

My understanding is that immediate parameters in ARMv8 A64 assembly can be 12 bits long. If that is the case, why does this line of assembly code:

AND X12, X10, 0xFEF 

Produce this error (when compiled with gcc)

Error:  immediate out of range at operand 3 -- `AND X12, X10, 0xFEF'

Interestingly enough, this line of assembly code compiles fine:

ADD X12, X10, 0xFEF

I'm using aarch64-linux-gnu-gcc (Linaro GCC 2014.11) 4.9.3 (prerelease)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Zack
  • 6,232
  • 8
  • 38
  • 68

3 Answers3

15

Unlike A32's "flexible second operand", there is no common immediate format in A64. For immediate-operand data-processing instructions (ignoring the boring and straightforward ones like shifts),

  • Arithmetic instructions (add{s}, sub{s}, cmp, cmn) take a 12-bit unsigned immediate with an optional 12-bit left shift.
  • Move instructions (movz, movn, movk) take a 16-bit immediate optionally shifted to any 16-bit-aligned position within the register.
  • Address calculations (adr, adrp) take a 21-bit signed immediate, although there's no actual syntax to specify it directly - to do so you'd have to resort to assembler expression trickery to generate an appropriate "label".
  • Logical instructions (and{s}, orr, eor, tst) take a "bitmask immediate", which I'm not sure I can even explain, so I'll just quote the mind-bogglingly complicated definition:

Such an immediate is a 32-bit or 64-bit pattern viewed as a vector of identical elements of size e = 2, 4, 8, 16, 32, or 64 bits. Each element contains the same sub-pattern: a single run of 1 to e-1 non-zero bits, rotated by 0 to e-1 bits. This mechanism can generate 5,334 unique 64-bit patterns (as 2,667 pairs of pattern and their bitwise inverse).

Notlikethat
  • 20,095
  • 3
  • 40
  • 77
  • 1
    This explanation makes a little more sense: "The logical immediate instructions accept a bitmask immediate bimm32 or bimm64. Such an immediate consists EITHER of a single consecutive sequence with at least one non-zero bit, and at least one zero bit, within an element of 2, 4, 8, 16, 32 or 64 bits; the element then being replicated across the register width, or the bitwise inverse of such a value. " – Zack Jun 18 '15 at 19:18
  • The arm bitmask immedate field is 13 bits (from what I can tell). Does anybody know precisely how those bits are interpreted (i.e., the algorithm for converting these 13 bits into a 32 or 64-bit value)? Why isn't that algorithm easy to find? – Zack Jun 18 '15 at 19:26
  • I found some code here that may be helpful: http://llvm.org/docs/doxygen/html/AArch64AddressingModes_8h_source.html – Zack Jun 18 '15 at 19:28
  • 1
    @Zack As with everything, the full, authoritative, definition can be found in the instruction pseudocode in [the ARM ARM](http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.f/index.html) (free to download, but you have to sign up to accept the license). In this case it's the `DecodeBitMasks()` function in the pseudocode appendix (page J8-5588 in issue A.f). – Notlikethat Jun 18 '15 at 19:42
  • Immediates for bitwise instructions aren't that hard to at least summarize: a repeating pattern, where within one element the set bits have to be contiguous. – Peter Cordes Feb 25 '21 at 11:21
12

Here is a piece of code to dump all legal bitmask immediates following the mechanism quoted in Notlikethat's answer. Hope it helps to understand how the rule for generating bitmask immediates work.

#include <stdio.h>
#include <stdint.h>

// Dumps all legal bitmask immediates for ARM64
// Total number of unique 64-bit patterns: 
//   1*2 + 3*4 + 7*8 + 15*16 + 31*32 + 63*64 = 5334

const char *uint64_to_binary(uint64_t x) {
  static char b[65];
  unsigned i;
  for (i = 0; i < 64; i++, x <<= 1)
    b[i] = (0x8000000000000000ULL & x)? '1' : '0';
  b[64] = '\0';
  return b;
}

int main() {
  uint64_t result;
  unsigned size, length, rotation, e;
  for (size = 2; size <= 64; size *= 2)
    for (length = 1; length < size; ++length) {
      result = 0xffffffffffffffffULL >> (64 - length);
      for (e = size; e < 64; e *= 2)
        result |= result << e;
      for (rotation = 0; rotation < size; ++rotation) {
        printf("0x%016llx %s (size=%u, length=%u, rotation=%u)\n",
            (unsigned long long)result, uint64_to_binary(result),
            size, length, rotation);
        result = (result >> 63) | (result << 1);
      }
    }
  return 0;
}
cigien
  • 57,834
  • 11
  • 73
  • 112
Yan
  • 121
  • 1
  • 3
3

An alternative explanation of bitmask immediates, now that is is morning and I finally understood the "mind-boggingly complicated" definition. (See Notlikethat's answer.) Maybe it would be easier for some to understand.

It is X>0 consecutive zeros followed by Y>0 consecutive ones, where X+Y is a power of 2, repeated to fill the whole argument and then rotated arbitrarily.

Also note that optional shifts in other immediate formats are by exact amounts of bits, not "up to". That is, the 16-bit immediates can be shifted by 0, 16, 32 or 48 bits exactly, while 12-bit immediates only by 0 or 12 bits.

EvgEnZh
  • 699
  • 8
  • 13
  • Interestingly, `and x13, x13, #0` and `#-1` aren't encodeable. That's good, they didn't waste coding space on useless immediates: `0` as an operand for bitwise-booleans is either a NOP or a zeroing operation, and `eor x13, x13, x13` or `sub` can still produce a zero with a data-dependency on the input (`std::memory_order_consume`) in one instruction. And that's so rarely needed it would have been fine to need two instructions for that. `eor` with `-1` is just a NOT, which there's an instruction for. `or` with `-1` would produce `-1` with a data dependency; can be done in 2 insns. – Peter Cordes Aug 16 '23 at 15:43
  • That makes sense: a 6-bit field can encode numbers from 0..63 or 1..64, but not 0..64. And making one of the encodings special, you still have enough room to encode 1..63, with handles bit-ranges from one 1 (rest 0s) to one 0 (rest 1s). I didn't check if that's how the machine code actually works. – Peter Cordes Aug 16 '23 at 15:46