1

In the small reproducer the symbol lookup of the typeinfo/vtable of the exception class with vtable fails. Why is it going wrong? Is it possible to make RTTI work correctly for classes with vtable loaded with dlopen? The purpose of the indirect load is runtime binding based on cpu.

lib.h:

#include <exception>
class myexception : public std::exception {
    virtual void info();
};
void f();

lib.cc:

#include "lib.h"
void myexception::info() {};
void f() { throw myexception(); }

main.cc:

#include "lib.h"
int main() {
    try { f(); }
    catch(myexception) {}
}

stub.cc:

#include <dlfcn.h>
#include <stdlib.h>
__attribute__((constructor)) void init() {
    dlopen("libreal.so", RTLD_NOW | RTLD_GLOBAL);
}

build.sh:

g++ lib.cc -Wall -Wextra -shared -o libf.so -fPIC -g
g++ main.cc -Wall -Wextra libf.so -fPIE -g
mv libf.so libreal.so
g++ stub.cc -Wall -Wextra -shared -o libf.so -fPIC -ldl -g

With GCC or clang+libstdc++:

./a.out |& c++filt 
./a.out: symbol lookup error: ./a.out: undefined symbol: typeinfo for myexception

, and with clang+libc (or GCC with -fPIC rather than -fPIE):

./a.out |& c++filt 
./a.out: symbol lookup error: ./a.out: undefined symbol: vtable for myexception

EDIT: Originally the question stated that the binary compiled with GCC segfaults. This is only the case if the binary is compiled without fPIC/fpic/fPIE/fpie. (Clang doesn't require the flag and the question wasn't updated in respect to the clang behavior). To simplify the question I edited the question to only ask about the runtime linker issue rather than the segfault.

Roland Schulz
  • 417
  • 3
  • 11
  • How can there be a segfault if your program doesn't even load? – Kerrek SB Jan 14 '18 at 02:38
  • Compiled with GCC it does load successfully and then segfault on throwing the exception. Compiled with Clang it doesn't load. – Roland Schulz Jan 14 '18 at 02:57
  • What's the point of instantiating `myexception` (first line in `main()`) ? – Sid S Jan 14 '18 at 03:55
  • Was a remanent of testing workarounds and forget to remove. Edited the question to remove it. Thx. – Roland Schulz Jan 14 '18 at 04:17
  • I assume it happens because of `f()` and `myexception` entries are placed in the `.text` section, but vtable or typeinfo symbols are placed in the `.bss` section of the executable and in the `.data.rel.ro` section of the library, such symbols are not resolved in the executable. – 273K Jan 14 '18 at 07:47
  • It is only in B(bss) with clang and no flags. With clang compiled with fpie/fpic it is in U (same as f()). And with gcc without flags it is in V (which then causes the segfault rather than the undefined symbol). And with gcc with fpie/fpic it is in U with an extra DW.ref._ entry in V. Without the virtual function (when it works) it is in V with an extra V entry for "typeinfo name for myexception" (for gcc and clang with with no flags, fpic,or fpie). Thus it seems to work only in V and with extra "typeinfo name". But why if f() works in U and why is name needed? – Roland Schulz Jan 14 '18 at 08:13

1 Answers1

1

From the man page for dlopen:

RTLD_LAZY

Perform lazy binding. Only resolve symbols as the code that references them is executed. If the symbol is never referenced, then it is never resolved. (Lazy binding is only performed for function references; references to variables are always immediately bound when the library is loaded.)

This is crucial. GNU ld implements lazy binding by default. This means your late binding method may work for functions, but not for data. (Vtables and RTTI info are data). If you link the executable with -z now (analogous to RTLD_NOW flag for dlopen), the method will stop working for functions too.

There are two basic ways to resolve the situation.

  1. Do not use the runtime data (vtable and type info) of myexception outside of the library. This means you are very restricted in what you can do with myexception directly. You can wrap all operations that reference the runtime data in non-virtual functions exported from the library.
  2. Move the runtime data of myexception into the stub library. For most C++ compilers this means defining the first (in the order of declaration) non-inline virtual function there. You can declare a dummy virtual function at the top of the class and implement it in stub.cc. The rest may be implemented in lib.cc.
Community
  • 1
  • 1
n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
  • Thanks! This explains part of the problem in a great way. Do you happen to be able to expand on it a bit? If the type info is treated like data then why does it work without the virtual function? And why does the virtual function of the parent (std::exception) not matter? Is it only sometimes treated like data? Is there some way to force the linker to store the type info in the lazy binding compatible (/non-data) way even with a virtual function? If not, is this just not implemented or is there some reason why this wouldn't work? – Roland Schulz Jan 15 '18 at 04:28
  • I now suspect a vtable anchoring effect on type info storage could explain what's going on. It seems to me that if vtable anchoring is possible (there is a out-of-line virtual function) then the RTTI information isn't stored with vague linkage and this then breaks the stub loading. Does that sound right? This allows a possible work-around of only having inline virtual members (does indeed work). But I can't find an option to force vague linkage even if vtable anchoring is possible. – Roland Schulz Jan 15 '18 at 06:02
  • If you only have inline virtual functins, then your class is not a part of the shared library in any meaningful way. Techically this means vtable and rtti data are static in each TU rather than global. If the system implements vague linkage, then one instance per TU becomes one instance per executable or shared lib as an optimisation. – n. m. could be an AI Jan 15 '18 at 06:36
  • I filed a feature request to make this possible. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83876 – Roland Schulz Jan 16 '18 at 20:53
  • Don't get your hopes up. There's an established ABI. No one will break an ABI to satisfy a fringe use case. – n. m. could be an AI Jan 16 '18 at 21:29