0

Here is my question. Suppose you want to compile the c code:

void some_function() {
  write_string("Hello, World!\n");
}

For this example, I want to focus specifically on the string: "Hello, World!\n". My understanding is that the compiler will put the string into the .rodata section in an elf file. A symbol, referring to its location in the .rodata section, is added to the symbol table and that symbol is kept in the .text section as a placeholder for the location of the string.

Here is the problem. How can you leave a value like that unresolved in machine code? In x86, it should be easy enough for the linker to do a find and replace on the symbol when the location is known. However, there are many CPU architectures where an address can not be encoded in its entirety into a single machine instruction. Therefore the value would have to be loaded in 2 stages, using separate machine instructions and the linker would have to figure that out. It would have to be smart enough to manipulate the machine code with half the address in one place the half the address in another. Furthermore, somehow the elf file has to represent this complex encoding scheme for the linker later on. How does this all work?

I most programs, this will be in a user space application. So the kernel may load the .rodata section wherever it wants in memory. So it would seem that when the program is loaded, somehow, at runtime, the kernel loader would have to resolve all these symbols in the program prior to beginning execution. It would have to inject into the machine code where it put each section so they may be referenced appropriately. How does this work?

I have a feeling that my understanding and above descriptions are wrong or that I am missing something very important because this does not seem right to me. Ether that, or there is in fact the logic to preform these complex functions within modern kernels and linkers. I am looking for some further explanation and understanding.

Echelon X-Ray
  • 138
  • 1
  • 10
  • Yes, if the address has to be encoded in a trickier way, then the linker does in fact need to be smart enough to do that, and the object file format has to be rich enough to express that it needs to be done. And so they're designed to be able to do it. You can see an example [here](https://stackoverflow.com/a/64841097/634919) for ARM64, where it needs the low 12 bits (absolute) and the next 21 bits (relative) in two separate instructions. – Nate Eldredge Dec 21 '20 at 02:18
  • But normally, the kernel doesn't load `.rodata` "wherever it wants". Traditionally, the binary would specify an absolute virtual address for each section to be loaded at. With ASLR, the base address where the binary is loaded may be random, but all the sections still get loaded together as a unit, so the offset of `.rodata` relative to the code is still fixed and known at link time. – Nate Eldredge Dec 21 '20 at 02:41
  • @NateEldredge Wouldn't that create a problem in No-MMU systems? – Echelon X-Ray Dec 21 '20 at 03:05
  • Yeah sure, I had in mind modern desktop/server machines. For a machine without an MMU then yes, everything has to be relocated when loaded. The binary has a relocation table indicating every place that it refers to an absolute address, and the loader adds the appropriate base address to all of them. – Nate Eldredge Dec 21 '20 at 03:12
  • I might have to read up on the ELF format specifications one of these days. – Echelon X-Ray Dec 21 '20 at 03:18
  • @NateEldredge: You know you can build PIC on a noMMU machine right? – Joshua Dec 21 '20 at 20:07
  • Sure, in which case there simply won't be any places where the binary refers to an absolute address. – Nate Eldredge Dec 21 '20 at 20:36
  • @Joshua: I tried compiling with PIC, but it seems to require some special sections in the ELF file to be included. – Echelon X-Ray Dec 31 '20 at 10:07

1 Answers1

2

Compilation takes place, emitting something like this:

lea rdi, [rip+some_function.hello_world]
mov rax, [rip+some_function.write_string]
call rax

after the asm pass, we end up with something that disassembles to

lea rdi, [rip+00000000]
mov rax, [rip+00000000]
call rax

where the two 00000000 slots are filled as load-time fixups. The loader performs symbol resolution and fills in the 00000000 values with the correct values.

This is a simplification. In reality there's an extra layer of indirection called the global offset table, which is used (among other things) to put all the fixups right next to each other.

The innards of how this works is CPU and OS specific, but in general you don't really have to care exactly how it works, and it could change in the next release of the compiler (and has changed at least twice already). The loader understands fixups at a very generic level using a fixup table, and can deal with new ideas so long as they resolve to put (absolute or relative) address of a symbol at offset + size.

The Alpha processor had it kind of bad back in the day. Fixups had to be in between functions, and relative addressing could be only done in signed 16 bit sizes, so the fixups for functions were located immediately before or after each function, and presumably you got an error in the ASM pass if the pointer didn't fit because the function was too big. I did come up with a clever sequence that would have fixed the problem on Alpha, but that was long after the platform was retired, and nobody cares anymore so it never got implemented.

I remember the bad old days from before the loader could do good patchups. There once was a global (and I really do mean global) table of shared library load addresses, and the compiler emitted absolute addresses and you had to rebuild your application if you changed a library, even though you used shared libraries. That just wasn't the brightest ideas, and no wonder people keps statically linked emergency binaries lying around. Breaking libc wasn't fun.

Joshua
  • 40,822
  • 8
  • 72
  • 132
  • Thank you. I appreciate your insights into this topic. So it does seem that the ELF format is in fact rich enough to support placeholders and encoding of addresses in multiple parts, as ether relative or absolute addresses, with some offset. And that the kernel loader and linker are smart enough to be able to fill patch these. – Echelon X-Ray Dec 21 '20 at 03:09