3

(This is presumably a fairly advanced problem, sorry about this :-))

I have the problem that I need to load a plugin (a shared library) into an application, but the plugin could use a library which is binary incompatible to the version of the library used by the application. My idea was to use dlmopen() and load the plugin into its own namespace. I expect to get two separate copies of the binary incompatible library (and for any other common dependency even if binary compatible).

This seems to work up to a certain extend, but under certain circumstances I get a segfault deep inside glibc, at the point where the constructors of static objects are called (this is what I found out with the debugger).

I have made a minimal example to reproduce the issue, which can be found on github: https://github.com/mhier/segregatedLinkingExample

The example uses libxml++ as an external, common C++ library, so you will need its development package to be installed. Run "mk.sh" to compile and then "main". It will then crash (at least it does on Ubuntu 16.04 and 18.04). If you remove the "-DWITH_CRASH" it no longer crashes.

The WITH_CRASH compile switch enables the use if libxml++ inside the main executable. It is always used in the plugin library libC. Only of libxml++ is used in both the main executable and the plugin I see the crash. "Using" in this context is as little as deriving a virtual class from it and making sure code for the derived class really gets generated by implementing the constructor/destructor. It is not even executing code in the plugin (other than via dl_init -> constructors of static objects etc.).

I cannot find much on the Internet about dlmopen. I have not found any bug reports pointing in the right direction. Has anyone ever used dlmopen with a new namespace for C++ libraries? Any form of input how to continue from this point is very welcome!

melpomene
  • 84,125
  • 8
  • 85
  • 148
  • Can you show us the code you're currently using to load the plugin? Also, does the open succeed? If you're trying to call a function in the plugin when the open failed, that's probably why it would crash. – Alexis Wilke Mar 05 '19 at 17:30
  • 1
    @AlexisWilke The open is not checked, but the resulting handle is not used either. – melpomene Mar 05 '19 at 17:31
  • 2
    Alternative solution: load incompatible library in separate executable that is compatible with that library. Dialing with ODR violations is not fun. – user7860670 Mar 05 '19 at 17:34
  • @AlexisWilke: The code is included in the example see the link. The open does not complete, as it segfaults inside (dl_init is called when loading the library). – Martin Hierholzer Mar 06 '19 at 08:27
  • @VTT: Yes I thought about that as well and this will most likely be the solution, but it will have a performance impact and requires a complicated shared memory protocol to communicate between the main application and the plugin. I am not sure if this is about ODR violations, since stuff gets loaded into a separate namespace. I know it's not a C++ namespace, but shouldn't this be similar? If not, what is the purpose of dlmopen in the first place? – Martin Hierholzer Mar 06 '19 at 08:30
  • 1
    I think `dlmopen` is a rather hackish function. There was a comment from GDB developer somewhere stating that they didn't implement proper support for debugging of binaries loaded through `dlmopen` because they couldn't find anyone using it. – user7860670 Mar 06 '19 at 09:05
  • @VTT: the whole point of using dlmopen() is to avoid the ODR constraint (i.e. for code on either side of that boundary). Of course, the ODR rule still applies to anything used in the interface, itself. – Droid Coder Mar 10 '19 at 04:14
  • @VTT: that GDB doesn't support something doesn't make it "hackish'. And perhaps you were thinking of this comment: https://stackoverflow.com/questions/51592455/debugging-strategies-for-libraries-open-with-dlmopen#comment90160785_51592455 – Droid Coder Mar 10 '19 at 04:18
  • @MartinHierholzer: which versions of libc and ld.so are you using? Just for the record. – Droid Coder Mar 10 '19 at 04:20
  • @DroidCoder I tried both on Ubuntu 16.04 (gcc 5.4.0-6ubuntu1~16.04.11 + glibc 2.23-0ubuntu11) and Ubuntu 18.04 (gcc 7.3.0-27ubuntu1~18.04 + glibc 2.27-3ubuntu1), to be precise. (My main target platform is Ubuntu 16.04) – Martin Hierholzer Mar 11 '19 at 08:28
  • Thanks for the info. – Droid Coder Mar 11 '19 at 18:59

2 Answers2

4

The problem is not related to C++.

It is a bug in the glibc version of libpthreads, which results in libraries loaded with dlmopen returning duplicates for pthread_key_create, resulting in thread-specific storage being clobbered (same key means same memory location, it's like malloc returning the same memory area multiple times).

The reason this crashes immediately is because libglib makes heavy use of thread-specific storage already in its on-load functions.

In detail, the problem is the direct use of __pthread_keys global variable which should instead be loaded via the thread descriptor (THREAD_SELF), thus ensuring thread-local keys are allocated in a structure shared by all instances of libpthread.

See the source for details: https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_key_create.c;h=a584db412b7b550fa7f59e445155dbfddaeb1d23;hb=HEAD

Reported to glibc: https://sourceware.org/bugzilla/show_bug.cgi?id=26955

Also when debugging this kind of thing in gdb, tip to get debug symbols:

  • check /proc/$pid/maps to find out where dlmopen loaded the library
  • find the entry point of the library (e.g. readelf -h /usr/lib/x86_64-linux-gnu/libglib-2.0.so )
  • in gdb use add-symbol-file to load the symbols.
    • as file name just specify the library file, if you have debug symbols installed gdb will find them the normal way - don't try to specify the symbol file directly
    • the address is the load address from /proc/$pid/maps + entry point address
Reimar
  • 56
  • 3
  • A question: Will calling all dmopen calls from main thread then then passing those pointers to the others threads will mitigate the problem ? – Itay Marom Jul 25 '23 at 16:33
0

So it seems the answer is not to do it. dlmopen seems to have issues with C++ which can result in undefined behaviour. Presumably the ODR violations are not perfectly fixed by the namespaces.

I admit, this answer is my subjective view. I have not found many good resources about using dlmopen for C++ libraries. Hence my conclusion is not to use it, as I need it to work reliably. I have seen very strange effects, e.g. my example in the question works again if I link the shared library against a particular third-party library (even without using it). Unless I can understand these effects, I would not trust a solution (as it could just work accidentally).

dlmopen() might work in other contexts, e.g. if one controls both the application and the shared library and can test if it loads properly.

  • Can you please cite any sources you have found, for this? I'm here because I'm also facing issues with dlmopen(). And yes, I do need dlmopen(). – Droid Coder Mar 10 '19 at 04:21
  • @DroidCoder No sources directly. I have just not found any sources about anyone successfully using dlmopen() with C++. There is only one promising project, but it seems to be abandoned for 1.5 years: https://git.collabora.com/cgit/user/vivek/libcapsule.git/ I didn't try it out. – Martin Hierholzer Mar 11 '19 at 08:33
  • I have clarified my answer and how I have come to my conclusion. – Martin Hierholzer Mar 11 '19 at 08:44
  • 1
    Thanks for the update & response. FWIW, I tried it with valgrind 3.12.0. It reported several errors that did not occur with dlopen(), and this test was in a context where I could safely substitute dlopen(). So, either that version of valgrind also doesn't support dlmopen() or there are open issues with it, in my glibc version (2.22). – Droid Coder Mar 11 '19 at 18:56
  • 1
    Please note that the answer by @Reimar is the correct answer.. :) – Or Birenzwige May 18 '21 at 12:16