4

I have three header files in my project which describe objects Rational, Complex, and RubyObject. The first two are templates. All can be interconverted using copy constructors, which are defined in the header files — except for those that construct Rational and Complex from const RubyObject&s, which are defined in a source file.

Note: Those definitions are there by necessity. If they all go in the headers, you get circular dependency.

A while back, I ran into some unresolved symbol errors with the two copy constructors defined in the source file. I was able to include in the source file the following function

void nm_init_data() {
    nm::RubyObject obj(INT2FIX(1));
    nm::Rational32 x(obj);
    nm::Rational64 y(obj);
    nm::Rational128 z(obj);
    volatile nm::Complex64 a(obj);
    volatile nm::Complex128 b(obj);
}

and then call nm_init_data() from the library entry point in the main source file. Doing so forced these symbols to be linked properly.

Unfortunately, I recently upgraded GCC and the errors are back. In fact, it seems to happen in a slightly different place with GCC 4.6 (e.g., on Travis-CI).

But it's not a version-specific issue (as I had thought before). We see it on Travis CI's Ubuntu-based system, which runs GCC 4.6. But we don't see it on an Ubuntu machine with either GCC 4.8.1 or 4.8.2. But we do see it on a Mac OS X machine with 4.8.2 — and not the same machine with 4.7.2. Turning off optimization doesn't seem to help either.

If I run nm on my library, the symbol is definitely undefined:

$ nm tmp/x86_64-darwin13.0.0/nmatrix/2.0.0/nmatrix.bundle |grep RationalIsEC1ERKNS
                 U __ZN2nm8RationalIsEC1ERKNS_10RubyObjectE
00000000004ca460 D __ZZN2nm8RationalIsEC1ERKNS_10RubyObjectEE18rb_intern_id_cache
00000000004ca458 D __ZZN2nm8RationalIsEC1ERKNS_10RubyObjectEE18rb_intern_id_cache_0

I'm not sure why there are two defined entries which are subordinate to the undefined symbol, but I also don't know as much as I'd like about compilers.

It also looks like the copy constructor is an undefined symbol for each version of the Rational template:

__ZN2nm8RationalIiEC1ERKNS_10RubyObjectE
__ZN2nm8RationalIsEC1ERKNS_10RubyObjectE
__ZN2nm8RationalIxEC1ERKNS_10RubyObjectE

"Well, that's strange," I thought. "Complex64 and Complex128 are also called in that nm_init_data function, but they both resolve properly — and aren't listed in the nm -u output." So I tried adding volatile before the Rational copy construction as well, thinking that maybe the compiler was optimizing out something we don't want optimized out. But that didn't fix it either, sadly. This did, with a caveat:

void nm_init_data() {
  volatile VALUE t = INT2FIX(1);
  volatile nm::RubyObject obj(t);
  volatile nm::Rational32 x(const_cast<nm::RubyObject&>(obj));
  volatile nm::Rational64 y(const_cast<nm::RubyObject&>(obj));
  volatile nm::Rational128 z(const_cast<nm::RubyObject&>(obj));
  volatile nm::Complex64 a(const_cast<nm::RubyObject&>(obj));
  volatile nm::Complex128 b(const_cast<nm::RubyObject&>(obj));
}

The caveat is that now I get the exact same error, but for the Complex objects instead. Argh!

dyld: lazy symbol binding failed: Symbol not found: __ZN2nm7ComplexIdEC1ERKNS_10RubyObjectE
  Referenced from: /Users/jwoods/Projects/nmatrix/lib/nmatrix.bundle
  Expected in: flat namespace

dyld: Symbol not found: __ZN2nm7ComplexIdEC1ERKNS_10RubyObjectE
  Referenced from: /Users/jwoods/Projects/nmatrix/lib/nmatrix.bundle
  Expected in: flat namespace

This is completely absurd. Here are the definitions for both of these functions, in the same source file as the nm_init_data() function:

namespace nm {
  template <typename Type>
  Complex<Type>::Complex(const RubyObject& other) {
    // do some things
  }

  template <typename Type>
  Rational<Type>::Rational(const RubyObject& other) {
    // do some other things
  }
} // end of namespace nm

Hint: One thing that is worth mentioning is that the error doesn't occur when nm_init_data() gets called (i.e., when the library is loaded). It happens much later, during another call to these troublesome functions.

How do I fix this problem once and for all, and others like it?

Community
  • 1
  • 1
Translunar
  • 3,739
  • 33
  • 55

2 Answers2

2

You claim the following, which I doubt.

Those definitions are there by necessity. If they all go in the headers, you get circular dependency.

In most cases you can solve such a circular entanglement by separating your code into an additional .hpp file, which is included together with the class definition that contains the template definitions anywhere needed.

If your code has a real circular dependency, it could not compile. Usually, if your dependencies seem to be circular, you have to look closer and go down to method level and check which of them would require both types to compile.

So it could be that your types use each other, then compile all in one .cpp file (e.g. via three .hpp includes). Or there are only pointer to another type, then use forward declarations to ensure, that all templates are resolved. Or third, you have some method that depend forward and some that depend backward, then put the one kind in one file, the others kind in another, and you are fine again.

Additionally, it seems that you should use a forward declaration for your missing items. I would expect something like the following after the definition of the function. E.g.:

template nm::Complex<nm::RubyObject>::Complex(const nm::RubyObject& other);
Uli Klank
  • 199
  • 10
  • Could you provide a link to some kind of simple example of this usage of .hpp files? And why would a forward decl be necessary *after* the function definition? I feel like this is a helpful answer, but I'm having trouble understanding what you mean in some places. For example, "you have to look closer and go down to method level and check which of them would require both types to compile." And also, "Or third, you have some method that depend forward and some that depend backward, then put the one kind in one file, the others kind in another, and you are fine again." – Translunar Nov 22 '13 at 20:05
  • Okay, so this actually worked! Which is crazy, because I was fairly certain I tried it already. How embarrassing. Thanks for the assistance. – Translunar Nov 23 '13 at 00:42
0

Rational, Complex... are templates

copy constructors... are defined in the header files — except for those that construct Rational and Complex from const RubyObject&s, which are defined in a source file.

And therein lies your problem. Since Rational and Complex are templates, all their methods need to be available in your header file.

If they're not, then you might sometimes be able to get away with it depending on the order in which things are called and the order in which things are linked -- but more often you'll get strange errors about undefined symbols, which is exactly what is happening here.

Simply move the definitions of Rational(const RubyObject&) and Complex(const RubyObject&) into the respective headers and everything should just work.

Community
  • 1
  • 1
Tristan Brindle
  • 16,281
  • 4
  • 39
  • 82
  • That's not an option, as far as I know. Each class has copy constructors for every other class. The order of inclusion is `complex.h`, `rational.h`, then `rubyobject.h`. I was able to make every copy constructor except for these two work in the headers, but these ones have to be in the source file or there's a compilation error. I also tried forward declarations. It's a circular dependency problem complicated by the presence of templates. – Translunar Nov 19 '13 at 05:03
  • Here's an explanation of why some of these have to be declared in the source file: http://stackoverflow.com/questions/625799/resolve-circular-dependencies-in-c – Translunar Nov 19 '13 at 20:07