4

(Only needs to work for gcc 5.4, if a general solution can't be found)

I have a generic factory that I use to construct objects based on some key (like a string representing a class name). The factory must allow classes to register that may not be known at construction time (so I can't simply register a list of classes explicitly).

As a means of registering these keys and their associated constructors, I have another 'RegisterInFactory' (templated) class. In each class's source file, I construct an object in an anonymous namespace corresponding to that class. This way, each class is automatically registered to the factory once the global objects are constructed. These objects never get used or referenced outside of doing this initial registration task.

However, when the code is compiled into a static library, when that library is linked into an executable, these static objects never get constructed, so the classes don't register to the factory, and the factory can't create anything.

I'm aware of the -Wl,--whole-archive -lfoo flag, which does include these global objects. But it also introduces a lot of 'multiple definition' errors. I'm aware that there's another flag that I can turn off the multiple definition errors, but I don't feel comfortable going without those errors. I'm aware of -u symbolName to turn off specific symbol names from these multiple definition errors (at least that's what I think it does). However, there are just too many of these redundant functions for that to be realistic (mostly from protobuf classes).

Is there any way to tell the compiler not to optimize those objects out, but only those objects so I can avoid the multiple definition issue? Is there another pattern I might be able to follow that fits within the constraints? (Particularly that I do not know at compile time what classes may be registered to the factory.)

Simplified Example code: Factory.h:

template<Base>
class Factory{
  ...
  template<Derived>
  class RegisterInFactory{
    RegisterInFactory(){
      instance().regInFactory(derivedConstructorFunctional);
    }
  };
};

In Derived.cpp:

namespace{ BaseFactory::RegisterInFactory<Derived> registerMe{"Derived"}; }

Final note: I've gotten lucky to some degree where without the linker flags, they still get included, but the only way that seems to happen is if the Derived class is 'sufficiently' complicated. Or maybe it's if I use the Derived class directly within the linked executable. I can't really tell why it's worked when it has.

shavera
  • 803
  • 1
  • 8
  • 18
  • 1
    I think the issue is that `static` in namespace scope are not constructed until right before some code in that cpp file is used. Ergo, if no code in a cpp file is ever called, those `static`s may never be constructed. This has nothing to do with Gcc optimizations, this is how C++ works. MSVC is an oddball that ignores this and constructs all namespace `static`s before starting `main`. – Mooing Duck Nov 08 '17 at 17:58
  • If you reference *some* symbol in a library module, it will be linked in (including the static objects). If there is no reference to anything in a module, it is not included. That's the way it is *supposed* to work. This has been asked several times before https://stackoverflow.com/questions/14116420/how-to-force-gcc-to-link-an-unused-static-library?noredirect=1&lq=1 – Bo Persson Nov 08 '17 at 17:59
  • I've browsed through the several other answers as well. My question is if there is a way to force _specific_ objects to be included, as opposed to the whole-archive option, which is not only excessive, but problematic in its excess. @BasileStarynkevitch: That may be what I'm looking for, I'll dig into it to see if it fits. – shavera Nov 08 '17 at 18:08
  • If you simply want to ignore the multiple definition errors, you can use `-Wl,--allow-multiple-definition` – tmm1 Aug 11 '19 at 21:09

1 Answers1

4

The issue is not related to optimizations. Rather how linkers link symbols from static libraries.

However, when the code is compiled into a static library, when that library is linked into an executable, these static objects never get constructed, so the classes don't register to the factory, and the factory can't create anything.

That happens because nothing else refers to that registration variable. Hence, the linker is not pulling in the definition of the symbol from the archive.

To tell a Unix linker to keep that registration variable even if nothing else refers to it, use -Wl,--undefined=<symbol> compiler switch when linking to that static library:

-u symbol

--undefined=symbol

Force symbol to be entered in the output file as an undefined symbol. Doing this may, for example, trigger linking of additional modules from standard libraries. -u may be repeated with different option arguments to enter additional undefined symbols.

If that registration variable has "C" linkage, then <symbol> is the variable name.

For C++ linkage you will need to lookup the mangled name using nm --defined-only <object-file>. You may also need to put that variable into a named namespace, so that it has external linkage.


Example:

[max@supernova:~/src/test] $ cat mylib.cc
#include <cstdio>

namespace mylib {

struct Register
{
    Register() { std::printf("%s\n", __PRETTY_FUNCTION__); }
};

Register register_me;

}

[max@supernova:~/src/test] $ cat test.cc
#include <iostream>

int main() {
    std::cout << "Hello, world!\n";
}

[max@supernova:~/src/test] $ make
mkdir /home/max/src/test/debug
g++ -c -o /home/max/src/test/debug/test.o -MD -MP -std=gnu++14 -march=native -pthread -W{all,extra,error,inline} -ggdb -fmessage-length=0 -Og test.cc
g++ -c -o /home/max/src/test/debug/mylib.o -MD -MP -std=gnu++14 -march=native -pthread -W{all,extra,error,inline} -ggdb -fmessage-length=0 -Og mylib.cc
ar rcsT /home/max/src/test/debug/libmylib.a /home/max/src/test/debug/mylib.o
g++ -o /home/max/src/test/debug/test -ggdb -pthread /home/max/src/test/debug/test.o /home/max/src/test/debug/libmylib.a

[max@supernova:~/src/test] $ ./debug/test 
Hello, world! <-------- Missing output from mylib::register_me.

[max@supernova:~/src/test] $ nm --defined-only -C debug/mylib.o
0000000000000044 t _GLOBAL__sub_I__ZN5mylib11register_meE
0000000000000000 t __static_initialization_and_destruction_0(int, int)
0000000000000000 B mylib::register_me                        <-------- Need a mangled name for this.
0000000000000000 r mylib::Register::Register()::__PRETTY_FUNCTION__

[max@supernova:~/src/test] $ nm --defined-only debug/mylib.o
0000000000000044 t _GLOBAL__sub_I__ZN5mylib11register_meE
0000000000000000 t _Z41__static_initialization_and_destruction_0ii
0000000000000000 B _ZN5mylib11register_meE                   <-------- The mangled name for that.
0000000000000000 r _ZZN5mylib8RegisterC4EvE19__PRETTY_FUNCTION__

# Added -Wl,--undefined=_ZN5mylib11register_meE to Makefile.
[max@supernova:~/src/test] $ make 
g++ -o /home/max/src/test/debug/test -ggdb -pthread -Wl,--undefined=_ZN5mylib11register_meE /home/max/src/test/debug/test.o /home/max/src/test/debug/libmylib.a

[max@supernova:~/src/test] $ ./debug/test 
mylib::Register::Register() <-------- Output from mylib::register_me as expected.
Hello, world!
Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
  • +1 for this approach. This works beautifully and is (IMHO) much less error prone than the whole-archive approach. – templatetypedef Nov 08 '17 at 18:21
  • I completely misunderstood the undefined symbol flag, then. Testing this approach now – shavera Nov 08 '17 at 18:22
  • @templatetypedef It works reliably, but the downside is that you need to specify that linker switch when linking that archive, which can be overlooked. – Maxim Egorushkin Nov 08 '17 at 18:22
  • I added `-Wl,--undefined=_ZN7animals12_GLOBAL__N_110registerMeE` as a flag to the linker in my test program (with `animals::[anonymous namespace]::RegisterInFactory registerMe{"cat"};` as the object, for example). That didn't seem to include the object. I moved the registrar out of the anonymous namespace `--undefined=_ZN7animals10registerMeB5cxx11E`, and still didn't seem to work. I may need to study this undefined flag more to know what's wrong. – shavera Nov 08 '17 at 19:18
  • @MaximEgorushkin Thanks so much! I look forward to experimenting with this more – shavera Nov 09 '17 at 15:56
  • This trick only works if the symbols are global (i.e. not `static`) and show up with `B` in the nm output (as opposed to `b`). – tmm1 Aug 11 '19 at 17:22
  • 1
    @tmm1 It is not a trick, rather how linkage works. Static means internal linkage and such symbols aren't accessible from different translation units. – Maxim Egorushkin Aug 11 '19 at 17:34