add 1 byte immediate value to a 2 bytes memory location

Question

The add instruction documentation from this page says the following:

Notice the two instructions that I highlighted.

I tried the following code in NASM (which conforms with the first highlighted instruction):

add WORD [myvar], BYTE 0xA5

But I got the following error:

warning: signed byte value exceeds bounds

What am I doing wrong?

@old_timer `add WORD [myvar], BYTE -91` works, but `add WORD [myvar], BYTE 165 doesn't work`. — user247763, Jul 14 '17 at 00:21
which are you trying to do -91 or +165? If it is +165 then fuz answered your question and already said that byte 165 isnt going to work. But if you were no trying +165 but instead -91 then use -91 instead of 0xA5 and fuz only partially answered your question. — old_timer, Jul 14 '17 at 00:24

score 8 · Answer 1 · edited Jul 14 '17 at 03:21

8

The 8-bit immediate operand (denoted here by imm8) is sign-extended into 16 (or 32) bits to match the size of the other operand (r/m16 or r/m32, respectively).

Thus, only values between -128 and 127 can be represented, which is why you receive this warning from the assembler.

For the value 0xA5, you need to use a WORD immediate (imm16):

add WORD [myvar], WORD 0xA5

(although the WORD is optional on the source operand, since it is implied by the constant's size).

edited Jul 14 '17 at 03:21

Cody Gray - on strike

239,200
50
490
574

answered Jul 13 '17 at 21:22

fuz

88,405
25
200
352

2

Almost always better to let the assembler choose the width of the immediate for you, and put the operand-size specifier on the memory operand when there's ambiguity. – Peter Cordes Jul 15 '17 at 02:40

score 4 · Answer 2 · answered Jul 15 '17 at 02:39

I won't repeat @fuz's answer, but I want to add:

If you had just let the assembler do its job by writing add word [myvar], 0xA5, it would have picked the smallest encoding that worked. If your immediate had fit in a sign-extended imm8, it would have used the add r/m16, imm8 encoding. There is usually no need to use size-overrides on non-memory operands. All the major x86 assemblers optimize the size of immediate operands. Some (e.g. NASM) will even optimize mov rax, 1 into the equivalent but shorter mov eax, 1, and stuff like that, but others (YASM) won't.

You can force the assembler to use wider immediates than necessary for padding/alignment, though. e.g. add word [myvar], strict word 1. would use the imm16 version. (Without strict, it doesn't stop the assembler from optimizing it to a smaller encoding.) You can also add word [rcx + strict dword 0], strict word 1 to force a [base + disp32] encoding for the addressing mode.

When possible, avoid 16-bit immediate operands to instructions other than mov. On many Intel CPUs, that instruction will be slow to decode, because of an LCP stall. This might not be a problem on newer CPUs that have a decoded-uop cache. But on older Intel CPUs, this will probably run faster, at the cost of a scratch register:

movzx  eax, word [myvar]
add    eax, 0xA5          # add ax, 0xa5 is 1B smaller, but has the same LCP stall.
mov    [myvar], ax

add/sub carry left-to-right, so the low part of a wider add is always the same as what you'd get from a narrow add. Avoiding LCP stalls for register operands is usually cheap (just an extra 1B for the add eax,imm32, since it doesn't need an operand-size prefix), but the load and store are extra.

This is a lot more code-size, so it's probably slower on CPUs that don't have LCP stalls. It's only 1 more uop for the front-end on Intel Sandybridge-family (which can micro-fuse the load+add in the one-instruction version), and the same number of uops for the execution units / scheduler. (memory-destination instructions decode to load, ALU, and store uops.)

(Another option would be `mov eax, 0xa5` / `add [myvar], ax`, but that's more uops for the back-end and same total for the front-end on modern Intel: (1 + 2 = 3). The memory-dst add's load+add micro-fuses, and the store-address+store-data uops micro-fuse, so `add [mem],reg` is 2 front-end uops total, 4 back-end uops. (Same total back-end uops as the movzx load; add reg,imm; mov-store; and that doesn't include a mov-immediate.) — Peter Cordes, Nov 06 '20 at 12:51

add 1 byte immediate value to a 2 bytes memory location

2 Answers2

Linked