Transformation of based indexed mode into indirect addressing mode (x86 assembly) part 2

Question

I'm corrently working on changing examples from complex indirect addresssing mode into simple indirect addressing mode pieces. However, I've come across an example from the Based Mode, I'm unable to "transform".

movzbl  string+8, %eax

I have tried this:

addl    $8, string
movzbl  string, %eax

After compiling this code, error message pop's up.

“error message pop's up” is not an error description. Please always tell us what error you get. Lastly, I don't understand what you are trying to achieve, `movzbl string+8, %eax` and `movzbl string, %eax` use the exact same addressing mode. — fuz, Apr 30 '18 at 12:34
Sorry for bad exspression, my misake. It isn't error message, my output is not the same. I don't know is command string+8 adding value of 8 to string or moving the string for 8 bytes. — Biggy Poopa, Apr 30 '18 at 13:05
What is `string`? If it's a label, then it has a fixed address at runtime, and `addl $8, string` actually adds 8 to the first doubleword stored at `string`. What you seem to want is `movl $string, %ebx` / `addl $8, %ebx` / `movzbl (%ebx), %eax`. However, I don't see the point in this. `movzbl string+8, %eax` is fine (there's no runtime addition happening there). — Michael, Apr 30 '18 at 13:09

Peter Cordes · Answer 1 · 2018-04-30T14:20:17.087

string+8 isn't a based-index addressing mode. It assembles to a disp32 absolute address with no base register. The +8 is resolved at assemble/link time. (See Referencing the contents of a memory location. (x86 addressing modes))

movzbl string+8, %eax assembles to machine code with the same addressing mode (ModR/M byte) as movzbl string, %eax, just a different disp32 displacement. See How does C++ linking work in practice? for some details about how assembling + linking take care of the +8 so there's no extra work at run time.

You can do this, because string+8 isn't an addressing mode, it's a link-time constant that you can use as an immediate operand.

mov     $string+8, %edx
movzbl  (%edx), %eax

Using mov instead of lea makes this point clear, IMO. The only reason to use lea for putting a static address into a register is in x86-64 when you can use it for RIP-relative addressing for position-independent code (or for code outside the low 2 GiB, like on OS X). e.g. lea string+8(%rip), %rdx.

The most over-complicated way to do the most useless stuff at run-time instead of assemble time would be

mov     $string, %edx
add     $8, %edx
movzbl  (%edx), %eax

I guess using lea would be even more over-complicated, or you could inc 8 times, or write a loop to inc 8 times, but that's over-complicated in a different way.

For example, given this source:

.globl _start
_start:
   mov  $string, %eax
   mov  $string+8, %eax
   movzbl string+8, %eax

.section .rodata
string:

I assembled with gcc -m32 foo.S -c and disassembled with objdump -drwC foo.o (the option -r shows relocations):

foo.o:     file format elf32-i386
Disassembly of section .text:
00000000 <_start>:
   0:   b8 00 00 00 00          mov    $0x0,%eax        1: R_386_32     .rodata
   5:   b8 08 00 00 00          mov    $0x8,%eax        6: R_386_32     .rodata
   a:   0f b6 05 08 00 00 00    movzbl 0x8,%eax         d: R_386_32     .rodata

Instead of real addresses, the 0 and 0x8 placeholders are the offsets from the symbol value for that relocation. They're against the .rodata section of the object file rather than string because I didn't use .globl _string to make that symbol global.

If I assemble+link with gcc -static -m32 -nostdlib foo.S and disassemble, I get:

 8048098:       b8 a9 80 04 08          mov    $0x80480a9,%eax
 804809d:       b8 b1 80 04 08          mov    $0x80480b1,%eax
 80480a2:       0f b6 05 b1 80 04 08    movzbl 0x80480b1,%eax

Notice how the absolute address to load from is right there in the last 4 bytes of the movzbl (in little-endian), the same 4-byte value that's an immediate for the b8 opcode (mov-imm32-to-eax).

Also notice how string and string+8 just result in different address bytes but the same opcode.

score 1 · Accepted Answer · answered Apr 30 '18 at 13:12

1

lea string, %eax
add $8, %eax
movzbl (%eax), %eax

but movzbl string+8, %eax is not a “complex addressing mode”, it is resolved by the assembler/linker.

answered Apr 30 '18 at 13:12

mevets

10,070
1
21
33

@BiggyPoopa `lea string+8, %eax` is as good as `lea string, %eax`, it will result into the same machine code, just the constant value is adjusted by assembler+linker to point +8 bytes further. It is like changing `mov $5, %eax` to `mov $2, %eax` `add $3, %eax` ... you *can* do that, but it's sort of pointless. – Ped7g Apr 30 '18 at 13:42
1

@Ped7g: It's subtly different: `$2+3` is calculated at assemble time, but `string+8` puts a relocation into the object file (with an offset) because the absolute (or relative) address of `string` isn't known until link time. But `string` does end up as `string+0` with 4 bytes of zeros as the offset from the symbol. So it's assemble-time vs. link-time, either way it's all sorted out before run-time. – Peter Cordes Apr 30 '18 at 14:09

Transformation of based indexed mode into indirect addressing mode (x86 assembly) part 2

2 Answers2