7

I'm encountering an issue which has been elaborated in a good article Shared Library Symbol Conflicts (on Linux). The problem is that when the execution and .so have defined the same name functions, if the .so calls this function name, it would call into that one in execution rather than this one in .so itself.

Let's talk about the case in this article. I understand the DoLayer() function in layer.o has an external function dependency of DoThing() when compiling layer.o.

But when compiling the libconflict.so, shouldn't the external function dependency be resolved in-place and just replaced with the address of conflict.o/DoThing() statically?

Why does the layer.o/DoLayer() still use dynamic linking to find DoThing()? Is this a designed behavior?

Joundill
  • 6,828
  • 12
  • 36
  • 50
xnervwang
  • 125
  • 1
  • 8

2 Answers2

6

Is this a designed behavior?

Yes.

At the time of introduction of shared libraries on UNIX, the goal was to pretend that they work just as if the code was in a regular (archive) library.

Suppose you have foo() defined in both libfoo and libbar, and bar() in libbar calls foo().

The design goal was that cc main.c -lfoo -lbar works the same regardless of whether libfoo and libbar are archive or a shared libraries. The only way to achieve this is to have libbar.so use dynamic linking to resolve call from bar() to foo(), despite having a local version of foo().

This design makes it impossible to create a self-contained libbar.so -- its behavior (which functions it ends up calling) depends on what other functions are linked into the process. This is also the opposite of how Windows DLLs work.

Creating self-contained DSOs was not a consideration at the time, since UNIX was effectively open-source.

You can change the rules with special linker flags, such as -Bsymbolic. But the rules get complicated very quickly, and (since that isn't the default) you may encounter bugs in the linker or the runtime loader.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • Thanks. Though I have several questions... But the first one, I don't fully understand your case. Per [Why does the order in which libraries are linked sometimes cause errors in GCC?](https://stackoverflow.com/questions/45135/why-does-the-order-in-which-libraries-are-linked-sometimes-cause-errors-in-gcc), for the link order `-lfoo -lbar`, if `libfoo` and `libbar` are static libraries, I think `libbar` would call the `foo()` in `libbar` itself since *linker searches from left to right, and notes unresolved symbols as it go.* said in this link? – xnervwang Jul 15 '20 at 00:11
  • @xnervwang You are correct: which `foo()` ends up being called depends (in the archive library case) on whether `main.cc` references `foo()` or not, and also on the linker being used (LLD does not implement traditional UNIX linking rules). – Employed Russian Jul 15 '20 at 07:03
  • OK I understand your presupposition now. In your case, assume `mani.cc` also calls `foo()`, so when `cc main.c -lfoo -lbar` (assume `libfoo.a` and `libbar.a`), since `-lfoo` appears firstly, so the `foo()` in `-lfoo` is used by `main.cc`, and `libbar.a` would so use `foo()` in `-lfoo`. Then consider another case, if they are `libfoo.so` and `libbar.so`, if we need to keep save behavior, so `libbar.so` should also calls `foo()` in `libfoo.so` rather than that one in itself. So that's why it's designed like this. – xnervwang Jul 16 '20 at 18:37
  • Hi @EmployedRussian, as you mentioned `This is also the opposite of how Windows DLLs work`, do you happen to know how to make Windows act the same? I'm porting a Linux project to Windows and this behavior drives me crazy. – psionic12 Dec 08 '20 at 11:48
2

Yes, this is a designed behavior. When you link a program into a binary, all the references to named external (non-static) functions are resolved to point into the symbol table for the binary. Any shared libraries that are linked against are specified as DT_NEEDED entries.

Then, when you run the binary, the dynamic linker loads each required shared library to a suitable address and resolves each symbol to an address. Sometimes this is done lazily, and sometimes it is done once at first startup. If there are multiple symbols with the same name, one of them will be chosen by the linker, and your program will likely crash since you may not end up with the right one.

Note that this is the behavior on Linux, which has all symbols as a flat namespace. Windows resolves symbols differently, using a tree topology, which has both advantages (fewer conflicts) and disadvantages (the inability to allocate memory in one library and free it in another).

The Linux behavior is very important if you want things like LD_PRELOAD to work. This allows you to use debugging tools like Electric Fence and CPU profiling tools like the Google performance tools, or replace a memory allocator at runtime. None of these things would work if symbols were preferentially resolved to their binary or shared library.

The GNU dynamic linker does support symbol versions, however, so that it's possible to load multiple versions of a shared library into the same program. Oftentimes distros like Debian will do this with libraries they expect to change frequently, like OpenSSL. If the program uses liba which uses OpenSSL 1.0 and libb which uses OpenSSL 1.1, then the program should still function in such a case since OpenSSL has versioned symbols, and each library will use the appropriate version of the relevant symbol.

bk2204
  • 64,793
  • 6
  • 84
  • 100
  • Thanks for the answer. But I'd like to know more details here. In the case of the article I pasted, the libconflict.so has implemented DoLayer() in layer.o and DoThing() in conflict.o. Then DoLayer() calls DoThing(). After compiling libconfict.so, I think the DoLayer() can call the DoThing() defined in libconflict.so itself. Why does it use another DoThing() implemented in execution rather than this DoThing() implemented in libconflict.so itself? Looks like in a .so dynamic library, if it is compiled from multiple .o files, then every call from one .o to another would rely on dynamic linking. – xnervwang Jul 14 '20 at 02:45
  • That's to allow symbol interposing, which is what bk2204 is referring to with `LD_PRELOAD` (preloading is one of multiple uses of interposing), which allows to replace functions depending on which module is loaded. You *can* force the link editor to resolve all of the definitions at build time, with `-Wl,-Bsymbolic` — see https://flameeyes.blog/2012/10/07/symbolism-and-elf-files-or-what-does-bsymbolic-do/ for more details of what that one does. – Diego Elio Pettenò Jul 14 '20 at 09:25
  • You are very much mistaken: loading Open SSL 1.0 and 1.1 into the same process doesn't work, and isn't expected to work. – Employed Russian Jul 14 '20 at 14:23
  • 1
    It does indeed work if you have symbol versioning set up, which Debian does. It will be completely broken if both versions don't have symbol versioning. Just because your distro doesn't doesn't mean it doesn't work. – bk2204 Jul 14 '20 at 22:59
  • @DiegoElioPettenò Thanks! This is a great blog. It nearly has answered all my questions. Now I understand my issue is because the .so dynamic library is PIC so it uses PLT to search function when it's not in GOT in my case. And I also know we can use GCC visibility or other GCC/linker options to solve this issue, like visibility=hidden can force a function to be in GOT then the when the .so need to find this function, it can find it in GOT firstly rather than asking the function address from dynamic linker. – xnervwang Jul 15 '20 at 03:32