2

Talked with an friend about how addressing in PE files works. He tells me that all of the addresses inside the PE file change when they are loaded through a relocation table (.reloc). I told him that I've never seen these tables inside a PE file, and all addressing is relative to the executable instruction. I'm sure I'm right, because I had to patch programs by changing the second operand in lea to an offset relative to the address of lea to my variable.

But after some research, I found out that such a table does exist. So I had a few questions.

  1. It's just a legacy? I have opened several .exe x64 files in the IDA and don't see the .reloc segment there.
  2. If this technology really exists, why does lea in runtime compute the real address every time (by adding IMAGE BASE), if we could just put the right addresses everywhere when loading?
  3. If it's not legacy, why is it needed in real tasks if relative addressing exists in x86/x64?
pawn1337
  • 31
  • 2
  • 1
    Position-independent code was a lot less convenient before x86-64 introduced the `[rip+rel32]` RIP-relative addressing mode. It doesn't add `IMAGE BASE`, it adds RIP, the address of the end of the instruction itself, which the CPU already has, the same way relative branches like `call` and `jge` work. It allows working with 32-bit relative addresses for code + libraries loaded anywhere in 64-bit address-space. Load-time fixups for 64-bit absolute addresses could be useful for static arrays of pointers, but compilers avoid that when they can because for efficiency. – Peter Cordes Jun 17 '23 at 01:19
  • See also [Why does this MOVSS instruction use RIP-relative addressing?](https://stackoverflow.com/q/44967075) (data addressing) and [GCC Jump Table initialization code generating movsxd and add?](https://stackoverflow.com/q/52190313) (Modern GCC uses relative offsets when it makes jump tables itself, to keep the code+data position-independent, at least when compiling with `-fPIE` or `-fPIC`) – Peter Cordes Jun 17 '23 at 01:22
  • @PeterCordes yes, I misspoke about adding an `IMAGE BASE`, by adding `IMAGE BASE` I meant that when the code is loaded and `lea` is already at the address with added `IMAGE BASE`, as a result of its work the correct address will be put in the register. But I'm interested in something else. Why in x64 they decided to make addressing in this way? Don't you think that performing RIP + offset compution in runtime each time is slower than if we put the correct address from the relocation table when loading the program? – pawn1337 Jun 17 '23 at 11:16
  • 1
    No, a 64-bit absolute address in the machine code would hurt front-end fetch/decode bandwidth and not help anything. The pipeline has many cycles between decode and exec where it could do the addition to internally make a 64-bit absolute address so it can do that at any point, or leave it for the AGUs which also have to handle addressing modes like `[rdi + rdx*2]` with a 64-bit adder with minimal latency. See also [Why do x86 jump/call instructions use relative displacements instead of absolute?](//stackoverflow.com/q/46184755) - it's cheap, and position-independent is usually good. – Peter Cordes Jun 17 '23 at 19:08
  • Also, 64-bit absolute addresses would make instruction lengths different from 32-bit mode in more cases, increasing complexity of the pre-decode and decode stages. Plus, but 2000 when AMD64 was being designed, it was well known that a lot of code went into libraries that got mapped at different addresses in different processes, so sharing those pages of memory was only possible with position-independent code. Pages that have had runtime fixups (text relocations) applies are no longer copies of the disk file, so have to get paged out to the page file instead of just dropped on memory pressure – Peter Cordes Jun 17 '23 at 19:12
  • @PeterCordes okay, I got your point, thanks. But you'd be better off posting this as a answer so that I can mark it as a solution to the question. – pawn1337 Jun 18 '23 at 13:51

1 Answers1

2

Relocation tables still exist. They do not need to be in a .reloc section, the important part is that the Relocation Direction RVA entry of the data directories array in the "optional header" (not 'optional' for executables) points to it. However, it often is in a .reloc section anyway.

It's just a legacy? I have opened several .exe x64 files in the IDA and don't see the .reloc segment there.

I found such a section in for example notepad.exe, calc.exe (very small relocation section), 7z.exe, subl.exe, various other common tools. It's not rare. I did not use IDA though, I used CFF Explorer.

Normally on x64, to refer to an address that is fixed relative to IMAGE BASE, RIP-relative lea (or RIP-relative addressing in general) is used, which adds a constant offset to the address of the byte directly after it (not to IMAGE BASE). x64 does not have many places where a 64-bit absolute address can be used directly in instructions (a 64-bit address can be loaded into a register the same as any 64-bit integer, or used as an address but only in mov to and from rax/eax/ax/al), making it inconvenient (though possible) to rely on relocations completely.

On the other hand, relocations are useful to update addresses that are stored in data segments, which are usually absolute addresses (otherwise they would be inconvenient to use, making them RIP-relative doesn't really work at all (which RIP), and putting a raw RVA would work but requires adding IMAGE BASE manually at every use). For other architectures such as ARM, MIPS, RISC-V, relocations may be more important (and maybe more complicated).

By the way contrary to what Wikipedia says:

When running native 64-bit binaries on Windows Vista and above, ASLR is mandatory[citation needed], and thus relocation sections cannot be omitted by the compiler.

The relocation section can certainly be omitted if no relocations are used, and you can create non-trivial executables like that. Just don't use absolute addresses. Or patch them yourself.

harold
  • 61,398
  • 6
  • 86
  • 164
  • Oh sorry, I misspoke about adding an `IMAGE BASE`, yes, because when the code is already loaded the `lea` command is at the address with the added `IMAGE BASE`, so it just has to add an offset to its address in the second operand. But I was still wondering a little differently, from a performance point of view, don't you think that adding an offset to the `RIP` every time in **runtime** is slower than if an absolute address was required, which could be put in with relocation table? Why did x64 decide to make addresses relative? – pawn1337 Jun 17 '23 at 11:02
  • @pawn1337 On non-Windows platforms, RIP-relative addressing is a big win, because it allows easy creation of position-independent executables. Those other platforms don't use relocation tables that modify code. Also, using RIP-relative addressing is a time-space tradeoff. The smaller code size could very well be worthwhile by fitting more code in the cache. – Myria Jun 21 '23 at 00:10