0

Why does following occur?

*abbreviating r= register, c = immediate value, byte value

  1. lea r, [r+c] ;correct as expected
  2. mov r, r+c ;error
  3. mov r, r+r ;error
  4. mov r, [r+c] ;correct
  5. mov r, [r+r] ;correct
  6. mov r, [c+c] ;correct
  7. mov r, c+c ;correct
  8. lea r, [c] equals mov r, c

More info:

line 2 - It is just simple calculation, right? But it can't be compiled. Also, at line 3 same problem. But at lines 4, 5, 6 and 7 MOV instruction does calculations nicely and compiler returns no error. So why does MOV fail at line 2 & 3 considering the fact that instructions at lines 2, 3 simpler than lines 4, 5 because no dereferencing done(in lines 2 and 3)?

line 8 - LEA (Load Effective Address) is supposed to calculate memory address so it should look for the address of where constant "c" stored as defined(reserved) in "section .data" and if we assume, for the sake of the scope of this question, it isn't defined or reserved there, then LEA should give a compilation error. But actually, what happens is it acts identically to MOV in this context. So, I am asking a explanation for the reason behind this strange behaviour of LEA instruction.

*I have read similar questions on "StackOverflow" regarding "difference" between "MOV" and "LEA" but they don't quite answer the situations described in this question.

Duke William
  • 151
  • 6
  • 2
    The only calculations allowed are address calculations. 2 and 3 are not address calculations. LEA does address calculations but stores the result in a register giving you the illusion of being able to do arbitrary math. – siride Jul 17 '22 at 16:58
  • Constants may also be part of calculations because they can be resolved at compile time and so don't actually exist in the output. – siride Jul 17 '22 at 17:00
  • @siride , Thanks for the answer, but can you elaborate why instructions at lines 2 and 3 cannot be resolved at compile time like constants? – Duke William Jul 17 '22 at 17:16
  • 1
    @DukeWilliam Because `r+c` and `r+r` are not constant expressions. – fuz Jul 17 '22 at 17:22

1 Answers1

3

x86 instructions support operands that can have an addressing mode — because the designers considered them useful enough to merit encoding.

For most instructions, the addressing mode computes an effective address that is used to access memory.  If the effective address involves an addition, the processor does that, uses the effective address, then discards the addition.

For lea, however, the effective address is computed the same (i.e. according to the addressing mode) but rather than being used for a memory access, the effective address itself is the value (put into the target register).  This gives us a way to capture an address computed by an addressing mode that would have been discarded if it were it used in another instruction like mov or add.

  1. lea r, [r+c] — there's an addressing mode for [r+c].  Computes r+c and makes that the answer.  No memory access performed.  Syntactically, the addressing mode is meant to appear the same as a memory reference in another instruction, so the []s would be kept despite not being an actual memory reference.

  2. mov r, r+c — this is a dynamic calculation because of r but there's no "addressing mode" for r+c.  Have to use add or lea to accomplish this addition.

  3. mov r, r+r — this is also a dynamic calculation, and there's also no addressing mode for r+r; Have to use add or lea for that.

  4. mov r, [r+c] — a dynamic computation that loads from memory at r+c.  There's an addressing mode for [r+c]r+c is computed and discarded after using it for memory address.

  5. mov r, [r+r] — also a dynamic computation, for which there is an addressing mode.  Loads from memory; r+r is computed and then discarded after use for memory access.

  6. mov r, [c+c] — there's no encoding for this but there is an encoding for [c], which the assembler will use once it has computed c+c at build time.

  7. mov r, c+c — there's no encoding for this but there is an encoding for c, which the assembler will use after computing c+c during the build.

  8. lea r, [c] equals mov r, c — yes, two different ways to encode the same operation.

mov and add instructions require a size to be specified in the encoding (1-, 2-, 4- or 8-bytes) — this is sometimes done by naming a target register (e.g. al, ax, eax, rax), which will inform the assembler of the size.  When addressing mode specifies a memory access, that size is also used for the memory access.

lea always targets a pointer sized register as the result is the effective address itself.

Erik Eidt
  • 23,049
  • 2
  • 29
  • 53
  • 2
    Fun fact: there multiple opcodes even for the `mov` mnemonic, for `mov reg32, constant`. e.g. 5-byte `mov reg, imm32` (with no ModRM byte), or 6-byte `mov r/m32, imm32` (with a ModRM byte, but you can encode a register destination). That's two encodings for the actual same *instruction*, not just same *operation*. (Using LEA to put an immediate constant into a register means it can only run on execution ports that support LEA, instead of any ALU port for one of the MOV opcodes.) And for 64-bit registers, there are even more choices, especially for constants that fit in 32-bit zero-extend – Peter Cordes Jul 17 '22 at 17:36
  • @PeterCordes, several fun facts! – Erik Eidt Jul 17 '22 at 17:37
  • 2
    Re: putting a label address into a register: [How to load address of function or label into register](https://stackoverflow.com/q/57212012) (including RIP-relative LEA, which this question omitted despite being tagged x86-64; that's one thing LEA can do but MOV can't.) And re: forms of MOV: [Difference between movq and movabsq in x86-64](https://stackoverflow.com/q/40315803) – Peter Cordes Jul 17 '22 at 17:39