Can the LEA instruction mimic every MOV instruction?

Question

Recently came across the M/o/Vfuscator

A complete single-instruction C compiler which compiles programs into "mov" instructions, and only "mov" instructions. Arithmetic, comparisons, jumps, function calls, and everything else a program needs are all performed through mov operations; there is no self-modifying code, no transport-triggered calculation, and no other form of non-mov cheating.

Question

I'm curious if every MOV instruction can be replaced with an equivalent LEA?

For example

 mov     rax, 10
 mov     rbx, rax
 ...

can be replaced with

 lea     rax, [10]
 lea     rbx, [rax]
 ...

Update

Oops, totally forgot that LEA cannot do a pointer dereference. For example

.data
     ten dq 10

.code

main proc

    mov     rax, offset ten
    mov     rbx, [rax]  ; <--- dereference, move 10 into rbx
     
    lea     rax, [ten]
                        ; <--- no dereference equivalent using lea
    ret
main endp

end

Peter Cordes · Answer 1 · 2023-01-16T22:53:07.370

No, LEA can't load or store from/to memory, unlike mov eax, [rdi] or mov [rdi+rcx*8], rax.
Movfuscator relies on using mov for table lookups to do logic, so this is a big problem. lea can only do addition (of registers and signed constants) and left-shift, so it's less powerful. As well as not allowing access to anything but registers.

LEA can't mimic every immediate-source mov, since mov r64, imm64 is the only x86-64 instruction that can use a 64-bit immediate (e.g. mov rcx, 0xdeadbeefdeadbeef). lea r64, [disp32] can only use a sign-extended 32-bit absolute displacement. (Conversely, RIP-relative LEA can do things MOV can't, being more efficient than call $+5 / pop rdi / add rdi, imm32)

LEA can't read or write 8-bit registers, including inability to emulate mov cl, ah or similar.

LEA can copy register to register for 16, 32, and 64-bit registers, but less efficiently (mov-elimination at register-rename time is specific to mov). Some instructions also need longer machine code, e.g. lea eax, [rbp] or lea rax, [rsp] take extra machine-code bytes because of addressing-mode special cases, while mov eax, ebp and mov rax, rsp are 2 and 3 bytes, respectively. See rbp not allowed as SIB base?. This also affects [r12] and [r13], which use the same ModRM values but distinguished by a bit in REX.

  401000:       8d 03                   lea    eax,[rbx]       # normal length, no REX
  401002:       89 d8                   mov    eax,ebx         # equivalent MOV
  401004:       41 8d 03                lea    eax,[r11]       # normal length, with a REX
  401007:       44 89 d8                mov    eax,r11d

  40100a:       8d 04 24                lea    eax,[rsp]    # [rsp] can only be encoded with a SIB byte
  40100d:       89 e0                   mov    eax,esp      # but mov reg,reg uses register-direct
  40100f:       41 8d 04 24             lea    eax,[r12]
  401013:       44 89 e0                mov    eax,r12d

  401016:       8d 45 00                lea    eax,[rbp+0x0]  # [rbp] can only be encoded with disp8=0 (or disp32 but that's longer)
  401019:       89 e8                   mov    eax,ebp
  40101b:       41 8d 45 00             lea    eax,[r13+0x0]
  40101f:       44 89 e8                mov    eax,r13d

  401022:       88 e1                   mov    cl,ah    # LEA can't do this at all.

Of course, the reason you'd use LEA is to do math in the addressing mode, not copy a register or set it to an immediate constant. e.g. to copy-and-increment like mov + add, and/or shift-and-add registers. Using LEA on values that aren't addresses / pointers?

Is it a typo for `add` ? " e.g. to copy-and-increment like `mov` + `and`, ..." — Sep Roland, Jan 16 '23 at 18:34

score 1 · Accepted Answer · answered Jan 15 '23 at 18:05

what you have just written are identical, but lea is most often used to do some address calculations inside square braces [], it can multiply 2 numbers add third number to those etc. in some places I've come across lea instruction doing just some math (and it is for that). when you have struct address for example and want to retrieve a member of it located, say offset + 8 bytes away from offset you do lea rax, [rbx+8], rbx being the address of struct. mov can be used to load addresses (just like lea), but values also. mov rax, [rbx] is equivalent to dereferencing a pointer pointed held in rbx. often times you'll see mov rax, [rbx+8*4] this is how you load the value of int array (assuming it takes 4 bytes to store ints on your system), equivalent in C would be array[8]. So to wrap up lea and mov could be used interchangeably when loading addresses in registers, but lea can't do pointer dereference, you need mov for that, or when storing values at memory addresses lea won't help, you need mov for those operations too. Note about doing math in square braces [] in lea, the thing is you have much more flexibility in terms of math with lea. Here is how to do math with mov when accessing memory to not break rules. link.

Can the LEA instruction mimic every MOV instruction?

2 Answers2