Static and Dynamic Linking, What's the need for PLT?

Question

I was reading this amazing article: https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html about dynamic and static linking.

After finishing the reading 2 questions are still unanswered or no clear enough for me to understand.

1)

This is not fine for a shared library (.so). The whole point of a shared library is that applications pick-and-choose random permutations of libraries to achieve what they want. If your shared library is built to only work when loaded at one particular address everything may be fine — until another library comes along that was built also using that address. The problem is actually somewhat tractable — you can just enumerate every single shared library on the system and assign them all unique address ranges, ensuring that whatever combinations of library are loaded they never overlap. This is essentially what prelinking does (although that is a hint, rather than a fixed, required address base). Apart from being a maintenance nightmare, with 32-bit systems you rapidly start to run out of address-space if you try to give every possible library a unique location. Thus when you examine a shared library, they do not specify a particular base address to be loaded at

Then how does dynamic linking solve this issue? On the one hand the write mentions we can't use same address and on the other hand he says using multiple addresses will cause lack of free memory. I'm seeing a contradiction hear (Note: I know what's virtual address).

2)

This handles data, but what about function calls? The indirection used here is called a procedure linkage table or PLT. Code does not call an external function directly, but only via a PLT stub. Let's examine this:

I didn't get it, why the handling of data is different that functions? what's the problem of saving function's addresses inside GOT as we used to do with normal variables?

Each and every function call would be an indirect function call through the GOT. This is substantially slower than doing direct calls through the PLT. — fuz, Jul 31 '21 at 13:55
Not really. On first call, the PLT entry jumps to the runtime linker which patches the PLT entry to go directly to the function on subsequent calls. So on all subsequent calls, there is an insubstantial overhead to the PLT. — fuz, Jul 31 '21 at 14:00
@fuz: `gcc -fno-plt` is actually *faster* than the default. Modern GNU/Linux systems no longer patch direct jumps in the PLT itself (and haven't for many years); the PLT entry just uses a memory-indirect jmp through the GOT entry (Even on 32-bit x86 where a `jmp rel32` *could* reach any address, unfortunately). Anyway, one indirect call can be faster than two direct call/jumps, especially in a non-huge loop where the same library function is called multiple times. ([See the `-fno-plt` section in this Q&A](https://stackoverflow.com/q/43367427) for performance links) — Peter Cordes, Aug 01 '21 at 04:05
@fuz: BTW, this allows the PLT to be in exec-only memory, not write+exec and without messing around with mprotect. And coolmo is actually correct, lazy dynamic linking works by having the `.got.plt` entry pointing to the fall-through path of the PLT, and the first-call handler updating that entry to point to the real function. — Peter Cordes, Aug 01 '21 at 07:20
@coolmo: You say "that handles data", but no, prelinking doesn't "handle data". Access to data in a different shared object has to load a pointer from the GOT. Look at `gcc -fPIC` output. (But in an executable, `gcc -fPIE`, usually the executable ends up with a weak definition of data, so the address of `std::cout` for example is a link-time constant and can be directly accessed by the main executable with RIP-relative addressing. But for that reason, code in a `.so` must go through the GOT even for its own globals, unless they use ELF visibility = hidden. — Peter Cordes, Aug 01 '21 at 07:24
See also https://web.archive.org/web/20171111043629/http://www.macieira.org/blog/2012/01/sorry-state-of-dynamic-libraries-on-linux/ — Peter Cordes, Aug 01 '21 at 07:25

score 3 · Accepted Answer · edited Aug 01 '21 at 05:19

On the one hand the write mentions we can't use same address and on the other hand he says using multiple addresses will cause lack of free memory.

On Linux before the switch to ELF some 15-20 years ago, all shared libraries had to be globally coordinated. This was a maintenance nightmare, because a system can have many 100s of shared libraries. You run out of address space assigning unique address to each library, even though some of these libraries are never loaded together (but the assigner of address space range doesn't know a priori which libraries are never loaded together, and therefore could be loaded into the same range).

Dynamic loader solves this by placing libraries into an arbitrary address range as they are loaded, and relocating them so they correctly execute at the address they have just been loaded at.

The advantage here is that you don't need to partition your address space ahead of time.

why the handling of data is different that functions?

It's different because when you access data, the linker is not involved. The very first access must work, and the data must be relocated before the library is available. There's no function call you can hook for lazy dynamic linking.

But for a function call, the linker can be involved. The program calls a PLT "stub" function foo@plt. On first call to that stub, it perform work to resolve a pointer to the actual foo() definition, and saves that pointer. On subsequent calls, foo@plt just uses the already-saved pointer to jump directly to the definition of foo().

This is called lazy relocation, and it saves a lot of work if the program never reaches many of the library function that it has call-sites for. (e.g. a program that evaluates a math expression and could call any libm.so.6 function, but for normal simple inputs, or with --help, only calls a couple.)

You can observe the effect of lazy relocation by running a large program with lots of shared libraries with and without LD_BIND_NOW environment variable (which disables lazy relocation).

Or with gcc -fno-plt (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00225.html), GCC will inline the call through the GOT, meaning the library function is reached in one call instead of two. (Some x86-64 Linux distros enable this for their binary packages.) This requires early binding, but slightly reduces the cost of each call, so is good for long-running programs. (PLT + early binding is the worst of both, except for having cache locality while resolving everything.)

Early binding doesn't have to link every function in libc, only ones where some code-path actually calls it. If many of those calls are not reached (e.g. you run a complex program with `--help`), then early binding did a bunch of useless work. But if most of the functions do eventually get called, you do all the work with the dynamic linking code and data hot in cache. — Peter Cordes, Aug 01 '21 at 04:58
Moreover, `gcc -fno-plt` will inline a GOT-indirect call (and force early binding), reducing the overhead for each library call from `call rel` + `jmp mem-indirect` to just a `call mem-indirect` (on x86-64). For a program like clang (which makes a *lot* of calls into LLVM), this is a significant overall win, except in the `--help` case. See https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00225.html for benchmarks, e.g. x86-64 `clang -O2 -g` compiling tramp3d goes from 41.6s to 36.8s when clang was compiled by `gcc -fno-plt`. (Most of this is from speeding up calls, not startup.) — Peter Cordes, Aug 01 '21 at 05:01
Sorry to break this to you, but Linux switched to ELF some 25 years ago. _Tempus fugit_. — ninjalj, Nov 11 '21 at 16:25

Static and Dynamic Linking, What's the need for PLT?

1 Answers1

Linked