5

I'm new to assembly and I'm learning it from Programming from the Ground Up. On pages 41 and 42, the book talks about indexed addressing mode.

The general form of memory address references is this: ADDRESS_OR_OFFSET(%BASE_OR_OFFSET,%INDEX,MULTIPLIER)
All of the fields are optional. To calculate the address, simply perform the following calculation:
FINAL ADDRESS = ADDRESS_OR_OFFSET + %BASE_OR_OFFSET + MULTIPLIER * %INDEX
ADDRESS_OR_OFFSET and MULTIPLIER must both be constants, while the other two must be registers. If any of the pieces is left out, it is just substituted with zero in the equation.

So I decided to play around with this a little bit. I wrote the following piece of code:

.code32
.section .data
str:
    .ascii "Hello world\0"

.section .text
.global _start
_start:
    movl $2, %ecx       # The index register.
    mov str(, %ecx, ), %bl
    movl $1, %eax
    int $0x80

I expected to get 72 (ASCII code for H) as the exit result of the program since there isn't any multiplier (which based on the book, should be substituted with zero). But surprisingly I get 108 instead (ASCII code for l). I thought this might be an .ascii thing and tried to see if I can get different results with different data types. I got the same results with the .byte as well.

I tried to lookup indexed addressing mode in x86 assembly with AT&T syntax but I couldn't get anything useful (likely because I don't know what to search for).

Is there anything I'm missing or is it a mistake in the book? I really appreciate it if you elaborate given that I'm new to this field.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Amirreza A.
  • 736
  • 4
  • 10
  • Try to differentiate between the assembly language and its syntax, vs. machine code and its encodings. The machine code encodings dictate what can be encoded and what cannot. The assembly language is limited by the encodings, as it can only encode the encodable. But the assembly language syntax can vary widely for the same encodings. – Erik Eidt May 27 '21 at 02:36

1 Answers1

6

If any of the pieces is left out, it is just substituted with zero in the equation.

The book's general rule isn't quite accurate for the scale factor. The default scale factor is 1 if omitted.

x86 index scaling works in machine code as a 2-bit shift count.
That asm default is a shift count of << 0, but x86 asm source-code syntaxes (including AT&T) use multipliers instead of shift counts to represent this, and i << 0 is i * 1.

If you want no index, you need to omit mention of an index register in your addressing mode.


Note that having a default is a property of the assembler / the syntax, in this case AT&T, not of x86 itself. There can't be a "default" in machine code - there's no way to leave out bits in a byte, they have to be either 0 or 1. You either have a SIB byte (scale-index-base) with all fields, or you don't (as signalled by the ModRM byte) in which case there's no index at all.

There is a SIB encoding that means no-index, so in machine code you can still have a SIB byte without an index at all, but you wouldn't describe that as a multiplier of zero.

Some disassemblers represent that encoding as index = %eiz, e.g. in nopl 0(%eax, %eiz, 1) to show that there's a SIB byte but no index, but normally you only ever see that for NOPS. When that SIB-with-no-index is necessary to encode (%esp), it gets simplified to that.)


There are multiple syntaxes for x86 machine code. Although all of the ones I'm aware of agree that no scale just means shift=0, e.g. [str + eax + ecx] in Intel syntax; normally the first register is chosen as the base if there's more than 1, the 2nd as the index.

Of course in Intel syntax, to force ECX to be an index with no base register, you'd need to use [str + ecx*1] to intentionally waste a byte on a SIB encoding.
[str + ecx] in Intel is AT&T str(%ecx).


Also related:

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847