10

I have a C++ program of mine that I've disassembled, and it seems like the assembly is using the instruction pointer to get at string literals. For example:

leaq    0x15468(%rip), %rsi ## literal pool for: "special"

and

leaq    0x15457(%rip), %rsi ## literal pool for: "ordinary"

Why does the compiler use the instruction pointer to get at string literals? This seems like it would result in a substantial headache for any human programmer, although it's probably not as hard for the compiler.

My question, though, is why? Is there some machine based or historical reason or did the compiler writers just decide to use %rip arbitrarily?

Dovahkiin
  • 946
  • 14
  • 25
  • 2
    It allows you to create position independent code by making the references relative to the instruction pointer and not a fixed memory address. – Michael Petch May 31 '17 at 15:15
  • but this is simply amd64 RIP-relative addressing. most instructions in x64 use it. In such cases, the effective address is formed by adding the displacement to the 64-bit RIP of the next instruction. in your example `0x15468` and `0x15457` this is displacement. your disassembler show you instructions in such form. another disasm can show the same instruction in another form - show the calculated *effective* address instead of *displacement* - but this is only different forms of visualization. – RbMm Jun 01 '17 at 06:37
  • 1
    this form let save 4 bytes - if we use absolute address in x64 long mode as effective address - we need 8 bytes(64-bits) for this. but with rip-addressing we use only 4 bytes(32-bits) signed offset to rip - so we can assess `[rip-0x80000000, rip+0x7fffffff]` memory range. and this for instruction saved for us 4 bytes – RbMm Jun 01 '17 at 06:42
  • because your literal string is located inside you binary and if your binary less than 2GB in memory - the literals will be in range `[rip-0x80000000, rip+0x7fffffff]` – RbMm Jun 01 '17 at 06:43

1 Answers1

9

Remember that string literals in C++ are constant and non-modifiable. One way to ensure that is to place them together with the code in the code-segment, which is loaded into memory pages marked as read-only.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • I know that, but my question is why does the code use the `%rip` register? Why can't it use another register (`%rax`, `%rbx`, etc) or just use a constant address? – Dovahkiin May 31 '17 at 15:15
  • 3
    @Dovahkiin: "Position Independent Code". If it used a constant address, then the dynamic loader, on loading a library at a non-preferred base address, would have to change ("fixup") all the "constant" addresses, which also makes the code segment non-shared. Also, relative addresses tend to be smaller than complete pointers. – Ben Voigt May 31 '17 at 15:17
  • 2
    @Dovahkiin Because the compiler might not know exactly at which address the code segment is actually loaded. This could be part of the operating systems quests to foil attacks, or because the code is in a library which is dynamically loaded. – Some programmer dude May 31 '17 at 15:18
  • @Dovahkiin because during compilation the string literals are placed somewhere in the code section and the compiler knows only how far away those strings are from the current instruction or the start of this section, not their absolute address. – riodoro1 May 31 '17 at 15:20
  • 1
    For more about why RIP-relative for static storage (.rodata and .data) in general, see [Why does this MOVSS instruction use RIP-relative addressing?](https://stackoverflow.com/q/44967075) / [Why is the address of static variables relative to the Instruction Pointer?](https://stackoverflow.com/q/40329260) – Peter Cordes Oct 05 '22 at 11:41