What do the `uxtx` and `sxtx` extensions mean for 32-bit AArch64 `adds` instruction?

Question

I'm looking at the following disassembled AArch64 instruction:

65 6E 20 2B    adds w5, w19, w0, uxtx #3

According to the ARM manual, uxtx zero-extends w0 to an unsigned 64-bit value before adding it to the value in w19. But w19 is a 32-bit "slice" of x19, and the result is stored in a 32-bit slice of x5. That is, the sizes of the operation's values differ.

The question is not restricted to adds; other AArch64 instructions like add or sub exhibit the same encoding. The question also applies to the 64-bit sxtx signed extension, which due to sign extension issues might very well be expected to not behave the same as the 32-bit sxtw.

Are uxtx and sxtx acting exactly like uxtw and sxtx respectively when used with 32-bit register slices? If so, what value is ARM providing by supporting both [us]xtw and [us]xtx extension encodings for these apparently identical operations? If not, is there a difference that would be visible to the user program?

I've seen `uxtx`/`sxtx` as part of addressing modes (https://godbolt.org/z/4G5c6ProM) to allow compilers to avoid doing sign-extension when code uses an `int` as an array index with a 64-bit pointer, but wasn't aware of this usage. I assume it's the same as addressing-modes where `sxtw #2` is sign-extend and left-shift by 2 (e.g. to index an `int` array, vs. just `sxtx` to not shift when indexing a char array). So perhaps for a 32-bit add, there are redundant ways to encode a left-shift, as sign- or zero-extending? Not posting an answer since I didn't check the manuals. — Peter Cordes, Apr 28 '22 at 10:28
I've looked in the ARM manual and the **Operation** section for the `add` instruction specifies a call to `ExtendReg` but no mention of what happens when `ExtendReg` returns a 64-bit value to the following 32-bit addition. So, should my decompiler truncate the 64-bit result blindly or not? — John Källén, Apr 28 '22 at 10:35
BTW, the addressing modes I mentioned in my first comment were `sxtw`, not `sxtx`. (IDK what the difference is either, and would be interested to read an answer explaining the design of AArch64's sign/zero extension stuff.) — Peter Cordes, Apr 28 '22 at 12:00
I can try to add an answer later, but in short, as I understand it: As a side effect of keeping the encoding scheme simple, some of the encodings are redundant and have the same effect. But the ARM64 assembly language requires that each encoding should have a distinct way to express it in assembly. So although for a 32-bit instruction `uxtx / sxtx / uxtw / sxtw` all have the same effect, they allow you to select which of the four possible encodings you want, for the rare situations where it matters. — Nate Eldredge, Apr 28 '22 at 15:31
In the pseudocode of `ExtendReg`, for the case of `adds w5, w19, w0, uxtx #3`it actually does return `bits(32)`, so there's no type mismatch. The parameter `N` here is `datasize` which is 32, and `N` is what is passed to `Extend`. And in any case, because of the `len = Min(len, N - shift)` on the second-to-last line, you can see that whether `len` is initially set to 32 or 64 by `UXTW/UXTX`, the overall effect doesn't change. — Nate Eldredge, Apr 28 '22 at 16:46

Nate Eldredge · Accepted Answer · 2022-05-15T02:21:39.020

They all do the same thing, i.e. nothing.

As you say, logically, sign- or zero-extending a value to a width larger than the operand size should not actually affect the value used, and that's correct. You can confirm it with a careful reading of the pseudocode in the Architecture Reference Manual. In the code for ExtendReg, note the line len = Min(len, N - shift). Here N is 32, so it makes no difference whether len is 32 or 64.

Similarly, uxtx and sxtx are both no-ops for either 32-bit or 64-bit instructions.

So the following instructions all have exactly the same architectural effect, performing the operation w0 = w1 + (w2 << 3). I actually tested them with a selection of chosen and random inputs, verifying that the results and flags are identical for all five.

   0:   2b224c20    adds    w0, w1, w2, uxtw #3
   4:   2b22cc20    adds    w0, w1, w2, sxtw #3
   8:   2b226c20    adds    w0, w1, w2, uxtx #3
   c:   2b22ec20    adds    w0, w1, w2, sxtx #3
  10:   2b020c20    adds    w0, w1, w2, lsl #3

However, note that their encodings are different.

And that is also why they use different mnemonics for the extension operation: one of the principles of the ARM64 assembly language is that every legal binary encoding should have its own unambiguous assembly. So if for some obscure reason you care whether you get the encoding 0x2b224c20 or 0x2b226c20 -- say you are trying to write shellcode where certain bytes are forbidden -- you can specify uxtw or uxtx to select the one you want. This also means that if you disassemble and reassemble a section of code, you will get back the identical binary that you put in.

(Contrast the situation in x86 assembly language, where redundant encodings do not get distinct mnemonics. So add edx, ecx may assemble to either 01 ca (the "store form") or 03 d1 ("load form"), and assemblers often don't give you any way to pick which one. Likewise both encodings will disassemble to add edx, ecx, so if you disassemble and reassemble you may not end up with the same binary you started with. See How to resolve ambivalence in x64 assembly? and its duplicate links.)

The mnemonics for the extension operators reflect the encoding structure, which also helps to explain why the redundant encodings exist in the first place. The extension type is encoded in a 3-bit "option" field, bits 13-15 of the instruction. Bits 13-14 specify the width of the value to be extended:

00 = 8-bit byte B
01 = 16-bit halfword H
10 = 32-bit word W
11 = 64-bit doubleword X

Note that X is always effectively "no extension". Then bit 15 specifies the signedness: 0 = unsigned U, 1 = signed S. So 010 = uxtw and 011 = uxtx since that is what they logically specify, even though for a 32-bit operation, both have the same actual effect (i.e. none).

This might seem like a waste of the instruction space, but presumably it allows the decoder hardware to be simpler than if the otherwise redundant encodings were to select some different operation.

The last option listed above, adds w0, w1, w2, lsl #3 has a different encoding altogether because it selects the "Add (shifted register)" opcode, instead of the "Add (extended register)" opcode as the first four do. So this is another redundancy; an add without extension, with a left shift of 0-4 bits, can be done with either opcode. However, this is not entirely useless, because the "extended register" form can use the stack pointer register sp as an operand, while the "shifted register" can use the zero register xzr/wzr. Both registers are encoded as "register 31", so each opcode has to specify whether it interprets "register 31" as the stack pointer or as the zero register. So the fact that the two opcodes have overlapping effect lets the instruction set provide addition using either the stack pointer or the zero register, where otherwise only one or the other could be supported.

The sxt/uxt syntax shows up in a couple other places in the ARM64 assembly language, with slightly different details in each case.

The sxt*/uxt* instructions, which simply sign- or zero-extend one register into another. They are aliases for special cases of the sbfm/ubfm bitfield move instructions. sxtb, sxth, uxtb, uxth work with either a 32- or 64-bit destination, and sxtw x0, w1 with a 64-bit destination only.

The GNU assembler at least also supports uxtw w0, w1 and uxtw x0, w1, although the official Architecture Reference Manual does not document them. But they are both just aliases for mov w0, w1, since writes to 32-bit registers always zero the high half of the corresponding 64-bit register. (And a fun fact is that mov w0, w1 is itself an alias for orr w0, wzr, w1, a bitwise OR with the zero register.)

There are no mnemonics for the trivial uxtx, sxtx which would just be a 64-bit move. I suppose logically uxtx x0, x1 could be an alias of ubfm x0, x1, #0, #63, encoded as 0xd340fc20, but they didn't bother to support it. The uxtx operator to adds is needed because otherwise there would be no way to assemble 0x2b226c20, but since 0xd340fc20 can already be obtained with ubfm it doesn't need another redundant name. (Actually it seems ubfm x0, x1, #0, #63 disassembles as lsr x0, x1, #0, since the immediate shift instructions are also aliases for bitfield move.) Likewise, the useless sxtw w0, w1 is also rejected by the assembler.
The extended-register addressing modes for the load, store, and prefetch instructions. They normally take 64-bit base and index registers ldr x0, [x1, x2], but the index can also be specified as a 32-bit register with either zero or sign extension: ldr x0, [x1, w2, uxtw] or ldr x0, [x1, w2, sxtw].

Here there is again a redundant encoding that appears. These instructions contain a 3-bit "option" field with the same position and format as for add and friends, but here the byte and half-word versions are unsupported, so the encodings with bit 14 = 0 are undefined. Of the remaining four combinations, uxtw (010) and sxtw (110) make perfect sense. The other two use a 64-bit index with no extension, and so have the same effect as each other, but they need to be assigned distinct assembly syntax. The 110 encoding, which might logically be uxtx, is designated the "preferred" encoding and is written with no operator as ldr x0, [x1, x2], or ldr x0, [x1, x2, lsl #3] for the shifted-index the shifted version. The redundant 111 encoding is then selected with ldr x0, [x1, x2, sxtx] or ldr x0, [x1, x2, sxtx #3]
The uxtl/sxtl Extend Long SIMD instructions, which zero- or sign-extend the elements of a vector to double their original width. These are actually aliases for the ushll/sshll long shift instructions, with a shift count of 0. But otherwise there is nothing unusual about their encodings.

What do the `uxtx` and `sxtx` extensions mean for 32-bit AArch64 `adds` instruction?

1 Answers1

Linked