How much runtime overhead is incurred by the loading of shared libraries on Linux

Question

I am searching for some statistics regarding the runtime time overhead that occurs when a program is loaded by using the runtime linker (e.g. ld.so). I am not an expert for how the runtime linker works but as I understand it usually performs the following actions:

Searching for shared libraries in the well known paths, or in the LD_LIBRARY_PATH
Loading the shared libraries
Symbol resolution for used functions

So when I start a program through a GUI or through a command line, at some point a system call to exec will happen and the requested program is started. Lets take a quick look to what happens then:

Exec(myprogram)
Operating system loads myprogramm into memory
Operating system turns over execution to _start
Some initialization happens and the runtime linker is run
main() is called

Assuming that the above list is correct and I did not leave out any major steps I would be interested in two things:

What is the overhead of step 4. according to theory?
How can I determine in practice the overhead of step 4. (e.g. for real programs such as Firefox or Chrome)?

Not sure, but to get a rough idea of timings involved, you could try "strace -tt"? — flu, Dec 08 '16 at 17:21
@user3528438 I am mainly interested in the over head that is created by using shared libraries. So if I compare to the case where all libraries are statically link then I still have roughly the same amount of IO (assuming no other program is using the shared library already). — lanoxx, Dec 08 '16 at 19:53
It is about `Linux`, or you expect some more general answer? With general answer you can lost a lot of details specific to concrete OS. — fghj, Dec 08 '16 at 21:03

score 3 · Accepted Answer · edited May 23 '17 at 12:13

Assuming that the above list is correct

It is not precisely correct, as this answer explains.

What is the overhead of step 4. according to theory?

It depends and varies greatly.

Some of the factors that play role:

How many dynamic libraries does the program link against?

It is not uncommon to have a few dozen libraries that are loaded.
Things dramatically slow down when there are 5000 or more of them.
How many data and function symbols does the program reference?

The data references have to be resolved at load time, but function symbols can be resolved lazily. (More on lazy symbol resolution here.)

How can I determine in practice the overhead of step 4. (e.g. for real programs such as Firefox or Chrome)?

With GLIBC, simply set LD_DEBUG=statistics in the environment. For example:

LD_DEBUG=statistics /bin/date
    104984:
    104984: runtime linker statistics:
    104984:   total startup time in dynamic loader: 1348294 clock cycles
    104984:         time needed for relocation: 501668 clock cycles (37.2%)
    104984:                  number of relocations: 90
    104984:       number of relocations from cache: 3
    104984:         number of relative relocations: 1201
    104984:        time needed to load objects: 413792 clock cycles (30.6%)
Sun Dec 11 17:51:35 PST 2016

score 0 · Answer 2 · answered Dec 08 '16 at 21:28

For a rough estimation, write two test programs: a static program and one that uses a dynamic library (variant: only the C runtime dynamic library). Then measure the difference in start time. If both programs are of comparable size, the difference can be attributed to dynamic loading.

score 0 · Answer 3 · edited May 23 '17 at 12:24

Exec(myprogram)

Operating system loads myprogramm into memory

Operating system turns over execution to _start

Some initialization happens and the runtime linker is run

main() is called

Assuming that the above list is correct and I did not leave out any major steps

Actually this is not right list, at least for Linux. Major note: dynamic linker (program that map required libraries into process address space run before program get control. There is ELF section with path to dynamic linker, usually something like /lib/ld-linux.so.2, and this program get control before real program, and it "load" shared libraries.

Minor note: "load into memory" actually not true, in fact file with executable and shared libraries files mapped into address space of process, and next 4K of code/data was loaded on demand (4K is common memory page size). Give control to _start also not exactly true, point where execution control is passed is taken from ELF header, it is convention that "_start" symbol point to this address, but I guess you can create ELF file that will work without "_start" symbol.

I would be interested in two things:

What is the overhead of step 4. according to theory?

How can I determine in practice the overhead of step 4. (e.g. for real programs such as Firefox or Chrome)?

As I wrote, at step 4 dynamic linker not run, actually if you instrument firefox or chrome programs, you can not measure time of work of ld-linux.so.2, because it run before any instruction of firefox/chrome executable took control.

You can edit executable of firefox and replace /lib/ld-linux.so.2 to /lib/ld-linux.so.3, then hack glibc (ld-linux.so part of it) and instrument it to measure it time.

All other code that run before main, I think you can profile in normal way, for example like this: What is a good easy to use profiler for C++ on Linux?

How much runtime overhead is incurred by the loading of shared libraries on Linux

3 Answers3