3

Background: I'm trying to implement a system like that described in this previous answer. In short, I have an application that links against a shared library (on Linux at present). I would like that shared library to switch between multiple implementations at runtime (for instance, based on whether the host CPU supports a certain instruction set).

In its simplest case, I have three distinct shared library files:

  • libtest.so: This is the "vanilla" version of the library that will be used as a fallback case.
  • libtest_variant.so: This is the "optimized" variant of the library that I would like to select at runtime if the CPU supports it. It is ABI-compatible with libtest.so.
  • libtest_dispatch.so: This is the library that is responsible for choosing which variant of the library to use at runtime.

In keeping with the approach suggested in the linked answer above, I'm doing the following:

  • The final application is linked against libtest.so.
  • I have the DT_SONAME field of libtest.so set to libtest_dispatch.so. Therefore, when I run the application, it will load libtest_dispatch.so instead of the actual dependency libtest.so.
  • libtest_dispatch.so is configured to have a constructor function that looks like this (pseudocode):

    __attribute__((constructor)) void init()
    {
        if (can_use_variant) dlopen("libtest_variant" SHLIB_EXT, RTLD_NOW | RTLD_GLOBAL);
        else dlopen("libtest" SHLIB_EXT, RTLD_NOW | RTLD_GLOBAL);
    }
    

    The call to dlopen() will load the shared library that provides the appropriate implementation, and the application moves on.

Result: This works! If I place an identically-named function in each shared library, I can verify at runtime that the appropriate version is executed based upon the conditions used by the dispatch library.

The problem: The above works for the toy example that I demonstrated it with in the linked question. Specifically, it seems to work fine if the libraries only export functions. However, once there are variables in play (whether they be global variables with C linkage or C++ constructs like typeinfo), I get unresolved-symbol errors at runtime.

The below code demonstrates the problem:

libtest.h:

extern int bar;

int foo();

libtest.cc:

#include <iostream>

int bar = 2;

int foo()
{
    std::cout << "function call came from libtest" << std::endl;
    return 0;
}

libtest_variant.cc:

#include <iostream>

int bar = 1;

int foo()
{
    std::cout << "function call came from libtest_variant" << std::endl;
    return 0;
}

libtest_dispatch.cc:

#include <dlfcn.h>
#include <iostream>
#include <stdlib.h>

__attribute__((constructor)) void init()
{
    if (getenv("USE_VARIANT")) dlopen("libtest_variant" SHLIB_EXT, RTLD_NOW | RTLD_GLOBAL);
    else dlopen("libtest" SHLIB_EXT, RTLD_NOW | RTLD_GLOBAL);
}

test.cc:

#include "lib.h"
#include <iostream>

int main()
{
    std::cout << "bar: " << bar << std::endl;
    foo();
}

I build the libraries and test application using the following:

g++ -fPIC -shared -o libtest.so libtest.cc -Wl,-soname,libtest_dispatch.so
g++ -fPIC -shared -o libtest_variant.so libtest_variant
g++ -fPIC -shared -o libtest_dispatch.so libtest_dispatch.cc -ldl
g++ test.cc -o test -L. -ltest -Wl,-rpath,.

Then, I try to run the test using the following command lines:

> ./test
./test: symbol lookup error: ./test: undefined symbol: bar
> USE_VARIANT=1 ./test
./test: symbol lookup error: ./test: undefined symbol: bar

Failure. If I remove all instances of the global variable bar and try to dispatch the foo() function only, then it all works. I'm trying to figure out exactly why and whether I can get the effect that I want in the presence of global variables.

Debugging: In attempting to diagnose the problem, I've done some playing with the LD_DEBUG environment variable while running the test program. It seems like the problem comes down to this:

The dynamic linker performs relocations of global variables from shared libraries very early in the loading process, before constructors from shared libraries are called. Therefore, it tries to locate some global variable symbols before my dispatch library has had a chance to run its constructor and load the library that will actually provide those symbols.

This seems to be a big roadblock. Is there some way that I can alter this process so that my dispatcher can run first?

I know that I could preload the library using LD_PRELOAD. However, this is a cumbersome requirement for the environment that my software will eventually run in. I'd like to find a different solution if possible.

Upon further review, it appears that even if I LD_PRELOAD the library, I have the same problem. The constructor still doesn't get executed before the global variable symbol resolution occurs. Usage of the preload feature just pushes the desired library to the top of the library list.

Community
  • 1
  • 1
Jason R
  • 11,159
  • 6
  • 50
  • 81
  • Code compiled with fPIC is not subject to any relocations at all. Instead, it uses global offset table and procedure linkage table to access the symbols. Your analysis is not correct. – SergeyA Dec 01 '15 at 18:32
  • @SergeyA: I'm not surprised that I would be wrong. My guess came from the fact that the `LD_DEBUG` output prints lines like `relocation processing: /lib/x86_64-linux-gnu/libc.so.6 (lazy)`, after which the symbol binding error occurs (it runs into a problem binding the global variable symbol). This relocation processing occurs *before* any `calilng init` lines show up; it's never getting to the point of even calling the constructors. – Jason R Dec 01 '15 at 18:34
  • You may want to search for the STT_GNU_IFUNC extension for selecting between symbols (not whole libraries) at load time. – ninjalj Dec 02 '15 at 00:23
  • Can you put the globals into `libtest_dispatch.so`? Any externally-visible globals have to have the same ABI between both versions of the library, so they can be factored out into the dispatch library that's linked normally, rather than with dlopen. I think this means `libtest` and `libtest_variant.so` should link against `libtest_dispatch.so` to see defs for the globals (and only declare them as `extern` themselves). – Peter Cordes Dec 02 '15 at 00:24

2 Answers2

3

Failure. If I remove all instances of the global variable bar and try to dispatch the foo() function only, then it all works.

The reason this works without global variables is that functions (by default) use lazy binding, but variables can not (for obvious reasons).

You would get the exact same failure without any global variables if your test program is linked with -Wl,-z,now (which would disable lazy binding of functions).

You could fix this by introducing an instance of every global variable referenced by your main program into the dispatch library.

Contrary to what your other answer suggests, this is not the standard way to do CPU-specific dispatch.

There are two standard ways.

The older one: use $PLATFORM as part of DT_RPATH or DT_RUNPATH. The kernel will pass in a string, such as x86_64, or i386, or i686 as part of the aux vector, and ld.so will replace $PLATFORM with that string.

This allowed distributions to ship both i386 and i686-optimized libraries, and have a program select appropriate version depending on which CPU it was running on.

Needless to say, this isn't very flexible, and (as far as I understand) doesn't allow you to distinguish between various x86_64 variants.

The new hotness is IFUNC dispatch, documented here. This is what GLIBC currently uses to provide different versions of e.g. memcpy depending on which CPU it is running on. There is also target and target_clones attribute (documented on the same page) that allows you to compile several variants of the routine, optimized for different processors (in case you don't want to code them in assembly).

I'm trying to apply this functionality to an existing, very large library, so just a recompile is the most straightforward way of implementing it.

In that case, you may have to wrap the binary in a shell script, and set LD_LIBRARY_PATH to different directories depending on the CPU. Or have the user source your script before running the program.

target_clones does look interesting; is that a recent addition to gcc

I believe the IFUNC support is about 4-5 years old, the automatic cloning in GCC is about 2 years old. So yes, quite recent.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • Thanks again for your insight. I probably wasn't clear enough in stating my goal, so option #1 (`$PLATFORM`) won't work. I'm looking to dispatch based upon, for instance, whether the CPU supports SSE2 only, SSE4.2, AVX, etc. I was hoping that I could just recompile my library for each configuration, then introduce a stub that could just load the correct version at runtime. I'm coming to the realization that this probably just isn't possible. The best solution I've found thus far is registering the dispatcher as an audit library using `LD_AUDIT`, but that's not very transparent to the user. – Jason R Dec 02 '15 at 04:42
  • You don't (generally) need to recompile the *entire* library, only the hot functions in it. And `target_clones` will do this for you with minimal effort. – Employed Russian Dec 02 '15 at 04:44
  • Indeed, that certainly looks like it can accomplish what I want. I'm trying to apply this functionality to an existing, very large library, so just a recompile is the most straightforward way of implementing it. `target_clones` does look interesting; is that a recent addition to `gcc`? – Jason R Dec 02 '15 at 04:45
  • From what I understand, the `target_clones` attribute does not allow you to provide the code, i.e. you're trusting the compiler to auto-vectorize. What I'd like is a way to provide multiple implementations of a function (carefully tuned for SSE/AVX/AVX2/AVX512) that works with multiple compilers. That's why the dynamic dlopen trick is appealing. – fsaintjacques Apr 27 '16 at 18:04
  • @fsaintjacques "target_clones doesn't allow ..." -- correct. But `IFUNC` does allow you to select between hand-coded implementations. – Employed Russian Apr 27 '16 at 18:52
  • I try to be `icc` compatible. I'll have to make a post about matrix compatibility of each methods. – fsaintjacques Apr 27 '16 at 20:25
1

It might not be relocations per se (-fPIC suppressess relocations), but a lazy binding through GOT (Global Offset Table), with the same effect. This is unvoidable, since Linker has to bind variables before init is called - simply because init might as well reference those symbols.

Ad for solutions... Well, once solution might be to not use (or even expose) global variables to the executable code. Instead, provide a set of functions to access them. Global variables are not welcome anyways :)

SergeyA
  • 61,605
  • 5
  • 78
  • 137
  • Thanks for the insight. I agree that global variables aren't a good thing. However, going beyond my toy example, I see the same symbol resolution errors with things like `typeinfo` for C++ classes, which is unavoidable as far as I can tell. As I suspected, there may be no good solution here. – Jason R Dec 01 '15 at 18:43
  • Same difference. Provide a C-only interface to your library, and hide all C++ stuff inside your code. This is a sane thing to do anyways - otherwise you will have to recompile your .so for every compiler/standard library implementation which might be used by your clients. – SergeyA Dec 01 '15 at 18:48
  • Thanks for the suggestion. The library in question is by nature a C++ one, so wrapping it in C won't really work. – Jason R Dec 01 '15 at 19:31