0

This question is very similar to Using C++11 multithreading in shared library loaded by program without thread support

I have a shared library which uses OpenMP and a main program, which calls a function from that.

testlib.cpp

#include <memory>
void foo(std::shared_ptr<int> f)
{
#pragma omp parallel for
    for (size_t g = 0; g < 100; g++) {
        auto other = f;
    }
}

tester.cpp

#include <memory>
void foo(std::shared_ptr<int> f);
int main() {
        foo(std::make_shared<int>(4));
}

I compile and link with:

g++ -fPIC -g -shared -o libtestlibd.so testlib.cpp -openmp -pthread
g++ -g tester.cpp -o tester -Wl,-rpath,`pwd` libtestlibd.so

However when running this on a machine with a custom (automatically) build compiler, this crashes with "pure virtual method called", "SIGSEV" and similar which I traced to __gthread_active_p returning false even though it should not, after all the program is (transitively) linked to libpthread

With gdb I set breakpoints inside the __gthread_active_p function with condition to trigger when the contained pointer is NULL or not, i.e. the function would return true or false. I then compared the backtrace and the address of the variable:

  • In both cases the call to __gthread_active_p originates from foo, so from the shared library
  • The address of __gthread_active_ptr varies and is either set or not
  • However the situation where __gthread_active_p is, according to the addresses of the functions, when the shared_ptr destructor from the main program is called, i.e. I guess the dynamic linker found 2 definitions of it and chose to use the one from the main program

When doing the same on my Linux Mint 20 system the program works and debugging it shows that the destructor from the shared lib is used.

It also works, if I link the main program explicitly with -lpthread, although I already link it with the shared lib, which links pthread.

The __gthread_active_p function and the referenced variables are all static, i.e. local to the translation unit. Hence only my 2 cpp files matter.

So my questions are:

  • Why is libpthread not loaded before the static variables of the main program are initialized?
  • The function even contains a static void *const which to my understanding should only be initialized on the first call to the function. This doesn't seem to be true here, the debugger shows the first call from inside the shared lib, so after pthread is loaded. Why?
  • Is this a bug in libstdc++? This sounds like a massive footgun that you need to know, if any of your shared libs uses threading and if so link to pthread too.
  • What could I check to find out why it happens in one environment but not the other, given the GCC and libstdc++ version are the same?
Flamefire
  • 5,313
  • 3
  • 35
  • 70
  • With GCC / G++, you should specify `-pthread` at both the compile stage and the link stage, for all translation units contributing to the program. It is not necessary to also link with `-lpthread` -- just `-pthread` is sufficient. – John Bollinger Apr 10 '21 at 04:08
  • My problem with that: If the shared library was not compiled by me (or hidden in some layers of make files from someone else), then I would need to know, it uses pthreads when linking against it. This sounds unfeasible to know for all used libraries. For example: System libraries (apt install and friends) So why that footgun? – Flamefire Apr 10 '21 at 09:10
  • With the caveat that I am intentionally writing comments, not an answer, it is because on at least some systems, multithreaded programs require differences at program loading. In particular, you tagged [weak-symbol], which very well might be a key part of your issue. Glibc contains a stub implementation of pthreads that it uses for programs not specified at link time to be multi-threaded. We quickly get deep into dynamic linking issues when we consider what happens when one DSO in a program is linked to that, and another to the full implementation. – John Bollinger Apr 10 '21 at 11:54
  • You should also consider, however, whether the foot-shooting may have actually happened when you built your own GCC (and, I suppose, Glibc). That you do not see the same behavior on your Mint system suggests that that the GCC and Glibc on that system were built in a way that sidesteps this issue. – John Bollinger Apr 10 '21 at 11:56
  • It would be useful if you told use where it is *not* working. (You told us it works on your Linux Mint 20 system, but don't give us any other system configurations. Is this MacOS, Windows, MVS, VM/370, ... ?) – Jim Cownie Apr 12 '21 at 07:47
  • In this case this is a RHEL 7 with a GCC 8 or 9 (yes, both) build with EasyBuild, but it is also reported to cause the same issues, on other systems. So in summary: Doesn't "work" on Linux wiht a GCC build with EasyBuild. But I think the issue is more general – Flamefire Apr 12 '21 at 13:05

0 Answers0