1

Here's a simple hello world file


 #include <stdio.h>            
 
 int main() {
     printf("hello, world\n");
     return 0;
 }

Here is the instruction to load the address of a string from the .rodata section into a register

lea rax, str.hello__world   ; hit0_0; 0x2004 ; "hello, world"

Because we are moving an address into a 64 bit register, we're using this form of LEA

REX.W + 8D /r | LEA r64,m | Store effective address for m in register r64

The hex dump of the instruction is:

- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x00001151  488d 05ac 0e00 0048 89c7 e8f0 feff ffb8  H......H........

So, the instructions look like:

REX.W: 0x48 ; 0x40 + the W bit is toggled
8D   : 0x8D ; 8D is the instruction itself
/r   : 0x05 ; I would be unable to get this without looking at the disassembly
            ; I'm guessing the offset is of wiki os dev
            

Here's the string for hello, world

- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x00002004  6865 6c6c 6f2c 2077 6f72 6c64 0000 0000  hello, world....

How do I calculate 0x2004 as the offset. I'm fairly sure the address begins at 0xac and the 0x05 is part of the offset

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Happy Jerry
  • 164
  • 1
  • 8
  • 2
    RIP-relative addressing modes don't need a SIB byte following the ModR/M byte. (And in fact can't use one; that would make it `[disp32]` absolute instead of `[rip+rel32]`). Compilers always use RIP-relative LEA for static data addresses, if they use an LEA at all. (If 32-bit absolute works, they'll use `mov`-immediate: [How to load address of function or label into register](https://stackoverflow.com/q/57212012)) – Peter Cordes May 10 '22 at 15:43
  • 1
    Yeah, you can see that there are only 4 bytes (the rel32) after the `05` byte, before the next instruction (which starts with another `48` REX prefix). So the rel32 is little-endian `ac 0e 00 00` and no SIB. Near duplicate of [How do RIP-relative variable references like "\[RIP + \_a\]" in x86-64 GAS Intel-syntax work?](https://stackoverflow.com/q/54745872) which has some machine-code examples, but maybe not quite. – Peter Cordes May 10 '22 at 16:29
  • 1
    Also [NASM x86\_64 assembly in 32-bit mode: Why does this instruction produce RIP-Relative Addressing code?](https://stackoverflow.com/a/49122235) contains an answer to the question, sort of. But really just see https://wiki.osdev.org/X86-64_Instruction_Encoding#RIP.2FEIP-relative_addressing for ModRM encoding. Or are you not sure about how you reach `0x2004` from RIP-relative to the end of the instruction starting at `0x1151`? – Peter Cordes May 10 '22 at 16:36
  • I believe the `/r` is what confused me. I don't know how `05` is generated? I see that that the byte is broken into three parts: `mod`, `reg` and `r/m`. I'm most confused about the MOD part, but I assume we are using `00` aka `[RIP/EIP1,2 + disp32]` for the mod part. `000` for the REG. And that only leaves us with `101` for the `r/m` which I have no idea how we got besides looking at the `101` above the EIP/RIP indirect addressing. – Happy Jerry May 12 '22 at 21:34
  • You mention " So the rel32 is little-endian `ac 0e 00 00`"? Are there any times where instructions/opcodes/operands are not little endian? And to clarify, each part of the instruction is a separate piece of encoding? So looking at `488d 05ac 0e00 00`, we read `48` `8d` `05` because they are separate parts of the instruction and we read the indirect address as `00 00 0e ac`, correct? – Happy Jerry May 12 '22 at 21:40
  • Any multi-byte integer in x86 machine code is always little-endian. (disp32, imm32, rel32, imm64, etc.) e.g. [Are machine code instructions fetched in little endian 4-byte words on an Intel x86-64 architecture?](https://stackoverflow.com/q/68229585) Also, it makes much more sense that it's a short-ish displacement than a huge negative displacement that happens to be a multiple of 1MiB, looking at which bytes are zero in that value. Re: in general parsing machine code in chunks, see [this answer](https://stackoverflow.com/a/56386888/224132). Most chunks are a single byte, correct. – Peter Cordes May 12 '22 at 22:03

0 Answers0