8

The availability of some platform-specific features, such as SSE or AVX, can be determined during runtime, which is very useful, if do not want to compile and ship different objects for the different features.

The following code for example allows me to check for AVX and compiles with gcc, which provides the cpuid.h header:

 #include "stdbool.h"
 #include "cpuid.h"

 bool has_avx(void)
 {
     uint32_t eax, ebx, ecx, edx;
     __get_cpuid(1, &eax, &ebx, &ecx, &edx);
    return ecx & bit_AVX;
 }

Instead of littering the code with runtime checks, such as the above, that repeatedly perform the checks, are slow and introduce branching (the checks could be cached to reduce the overhead, but there would be branching nonetheless), I figured that I could use the infrastructure provided by the dynamic linker/loader.

Calls to functions with external linkage on platforms with ELF are already indirect and go through the Procedural Linkage Table/PLT and Global Offset Table/GOT.

Suppose there are two internal functions, a basic _do_something_basic that always and a somehow optimized version _do_something_avx, which uses AVX. I could export a generic do_something symbol, and alias it to the basic add:

static void _do_something_basic(…) {
    // Basic implementation
}


static void _do_something_avx(…) {
    // Optimized implementation using AVX
}

void do_something(…) __attribute__((alias("_do_something_basic")));

During load-time of my library or program, I would like to check the availability of AVX once using has_avx and depending on the result of the check point the do_something symbol to _do_something_avx.

Even better would be, if I could point the initial version of the do_something symbol to a self-modifying function that checks the availability of AVX using has_avx and replaces itself with _do_something_basic or _do_something_avx.

In theory this should be possible, but how can I find the location of PLT/GOT programmatically? Is there an ABI/API provided the ELF loader, e.g. ld-linux.so.2, that I could use for this? Do I need a linker script to obtain the PLT/GOT location? What about security considerations, can I even write to the PLT/GOT, if I obtain a pointer to it?

Maybe some project has done this or something very similar already.

I'm fully aware, that the solution would be highly platform-specific, but since I'm already having to deal with low-level platform-specific details, like features of the instruction set, this is fine.

Sebastian Schrader
  • 1,453
  • 15
  • 19
  • As far as I know, Solaris solves this problem by running a script at boot that swaps the hardlinks of the affected libraries to match what the hardware can do. – fuz Nov 10 '16 at 11:59
  • The [ld.so(8) man page](http://man7.org/linux/man-pages/man8/ld.so.8.html#NOTES) of Linux' dynamic linker/loader also mentions special paths for hardware capabilities, but I'm not ware that any Linux distribution is actually using this, furthermore this is not available on x86-64 and supports only some features, notably not AVX. But a more fundamental issue is, that you would have to generate multiple versions of the library instead of only a single one. – Sebastian Schrader Nov 10 '16 at 12:36
  • Create separate versions of your library, then load the appropriate one using `dlopen`. Don't have to mess with PLT yourself. See this [answer for an example](http://stackoverflow.com/a/26037586/547981). – Jester Nov 10 '16 at 12:49
  • @Jester Not a good approach as that makes you incur the cost of calling a function through a function pointer on every call. – fuz Nov 10 '16 at 13:19
  • Another possibility that you might consider is using function pointers. If you take that approach, you can do your check one time, build your table of function pointers and never need to modify the PLT or GOT during runtime. – David Hoelzer Nov 10 '16 at 13:44
  • @FUZxxl have you taken a look at that answer at all? It is not using function pointers except the ones in the PLT which you go through anyway, and that's what OP asked. – Jester Nov 10 '16 at 14:12
  • @Jester A possible approach, but it would require generating multiple library files, which I want to avoid (see my earlier comment) – Sebastian Schrader Nov 10 '16 at 14:21
  • @Jester No, I didn't. Thanks for clearing this up. Still though, this feels like a nasty hack. – fuz Nov 10 '16 at 14:28
  • 1
    It's your right to think it's a hack, but to me that is the elegant and correct solution :) Load the appropriate library and the symbols get resolved automatically to the proper version. Certainly less hacky than manually messing with PLT entries, and it's also cross platform and future proof. – Jester Nov 10 '16 at 14:38
  • 2
    GCC 6 has [multi-versioning](https://lwn.net/Articles/691666/). You're reinventing the wheel. – MSalters Nov 10 '16 at 15:34
  • Agner Fog provides a detailed discussion of selecting appropriate code for different instruction sets in chapter 13 of [Optimizing software in C++: An optimization guide for Windows, Linux and Mac platforms](http://agner.org/optimize/optimizing_cpp.pdf). Most solutions that were proposed here are discussed there in detail. – Sebastian Schrader Nov 17 '16 at 14:56

3 Answers3

6

As others have suggested you can go with platform-specific versions of libs. Or if you are ok with sticking to Linux, you can use the (relatively) new IFUNC relocations which do exactly what you want.

EDIT: As noted by Sebastian, IFUNCs seem to also be supported by other platforms (FreeBSD, Android). Note however, that the feature is not that widely used so may have some rough edges.

yugr
  • 19,769
  • 3
  • 51
  • 96
  • `ifunc` is exactly what I'm looking for, but is this really Linux-specific? It looks to me that it's GNU and ELF specfic, so it should work on at least some other platforms as well. – Sebastian Schrader Nov 10 '16 at 15:19
  • Thanks, I had a different impression. I've updated the answer. – yugr Nov 10 '16 at 15:28
1

A simple way to do what you're asking for is to use your own function pointers instead of modifying those in the PLT.

For example:

extern void (*do_something)(...);

void
_do_something(...) {
     if (has_avx()) {
         do_something = _do_something_avx;
     } else { 
         do_something = _do_something_basic;
     }
     do_something(...);
}

void (*do_something)(...) = _do_something;

While this is cumbersome if you have a lot of these functions, doing it this way does't require any special compiler or linker features. (Though if you need to the functions to be thread safe on a platform where reading and writing pointers isn't atomic you'll need to make them atomic somehow. This isn't a problem on x86 platforms however.) If you do have a lot these functions, macros or C++ templates can help keep the typing down.

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
0

why don't you try the gcc option -mprefergot? When generating position-independent code, emit function calls using the Global Offset Table instead of the Procedure Linkage Table. so you only has one jump at GOT.

yugr
  • 19,769
  • 3
  • 51
  • 96
nausca
  • 1
  • Using only GOT instead of PLT+GOT doesn't solve my problem it just moves it: How do you obtain the GOT address programmatically? – Sebastian Schrader Jan 10 '17 at 09:39
  • There's also a `-fno-plt` option. IDK if that's the same thing or a more modern name for the same thing, but it inlines `call *foo@GOTPCREL(%rip)` instead of `call foo@plt` – Peter Cordes Jul 23 '21 at 04:08