4

The original issue is spread across hundreds of thousands LoC from different projects. It contains a lot of ingredients: in-line assembly, virtual inheritance, levels of indirection, different compilers and compiler options. (It's like a thriller.) I had a hard time to simplify to this SSCCE:

// a.hpp
struct A {
    int i;
    ~A() { asm("" : "=r"(i)); }
};

struct B : public virtual A { };

struct C : public B { };

struct D {
    D(C);
};

// a.cpp
#include "a.hpp"

void f(C) {
}

D::D(C c) {
    f(c);
}

// main.cpp
#include "a.hpp"

int main() {
    C c;
    D d(c);
}

Build with these command lines:

g++ -O3 -fPIC -c a.cpp
clang++ -O3 -fPIC -c main.cpp
clang++ -fuse-ld=gold main.o a.o -o main

And the linker output is:

a.o:a.cpp:function D::D(C) [clone .cold]: error: relocation refers to global symbol "construction vtable for B-in-C", which is defined in a discarded section
  section group signature: "_ZTV1C"
  prevailing definition is from main.o
clang-10: error: linker command failed with exit code 1 (use -v to see invocation)

I believe there's a bug in either gcc, clang or gold. My question is where is it? (I guess it's gold but I want to be sure before reporting the bug.)

FWIW: As I said, all the ingredients are important and the issue goes away if, for instance, the asm is removed. More notable changes that make the issue go away are:

  1. Use the same compiler for all TUs, (It doesn't matter whether g++ or clang++.)
  2. Link with ld (i.e., remove -fuse-ld=gold)
  3. Compile main.cpp without -O3.
  4. Compile main.cpp without -fPIC.
  5. Swap a.o and main.o in the linker command line.
Cassio Neri
  • 19,583
  • 7
  • 46
  • 68
  • `~A() { asm("" : "=r"(i)); }` = > `virtual ~A() { asm("" : "=r"(i)); }` – πάντα ῥεῖ Sep 29 '20 at 11:34
  • @πάνταῥεῖ Thanks but adding the virtual doesn't make the issue go away. Besides, this is not real production code just an over simplification of it (For the curious, the asm appears in a function called by boost::shared_prt's destructor which I don't have any control over). Finally, my question is not about good C++ practices. AFAIK, the code shown is legal C++ and we should be able to compile and link it as it is. – Cassio Neri Sep 29 '20 at 11:41
  • 2
    I believe that πάντα ῥεῖ's point is that you have no virtual functions in your example. At all. So it doesn't appear to fit a v-table related linkage errors. Though, the virtual base is the key here, anyway (maybe a comment next to it? To draw attention). – StoryTeller - Unslander Monica Sep 29 '20 at 11:44
  • @CassioNeri Also note: `virtual` destructors aren't generated automatically. You have to define them throughout the whole class hierarchy, `virtual ~B() = default;` etc. should be enough. – πάντα ῥεῖ Sep 29 '20 at 12:10
  • ["If a class has a base class with a virtual destructor, its destructor (whether user- or implicitly-declared) is virtual."](https://eel.is/c++draft/class.dtor#12) but that's not the point. This is a [Short Self Contained Correct Example](http://sscce.org/) and, as such, should carry the minimum necessary to reproduce the issue. Good C++ practices is another matter that do not necessarily make a good SSCCE. For instance, `A` has a user defined destructor, and by the rule-of-five it should contain the other special functions which, had I added, would distract from the real linking issue. – Cassio Neri Sep 29 '20 at 12:31

1 Answers1

1

This appears to be a bug in GCC, but the inline asm, being outside the ABI, could simply render this an unfortunate incompatibility between GCC and Clang.

The problem is that the inline asm is making GCC think that ~A::A() can raise an exception, so it creates a exception handling path in D::D(), which requires a construction vtable for B-in-C, which it places in the COMDAT group that also contains the vtable for C (_ZV1C).

Because Clang does not generate a construction vtable in the _ZV1C COMDAT group, but GCC does, you end up in a situation where the linker might keep the Clang-generated COMDAT group, and discard the GCC-generated version that has the construction vtable. If you link with the GCC-generated code that expects the extra symbol definition, you get this error.

Reversing main.o and a.o in your link also works around the problem, since all three linkers will then keep the COMDAT group from a.o, it being the first one seen.

Here's the code GCC generates for D::D(), from a.o:

0000000000000002 <_ZN1DC1E1C>:
   2:   48 83 ec 18             sub    $0x18,%rsp
   6:   48 8b 06                mov    (%rsi),%rax
   9:   48 8b 40 e8             mov    -0x18(%rax),%rax
   d:   8b 04 06                mov    (%rsi,%rax,1),%eax
  10:   89 44 24 08             mov    %eax,0x8(%rsp)
  14:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax
            17: R_X86_64_REX_GOTPCRELX  _ZTV1C-0x4
  1b:   48 8d 40 18             lea    0x18(%rax),%rax
  1f:   48 89 04 24             mov    %rax,(%rsp)
  23:   48 89 e7                mov    %rsp,%rdi
  26:   e8 00 00 00 00          callq  2b <_ZN1DC1E1C+0x29>
            27: R_X86_64_PLT32  _Z1f1C-0x4
  2b:   eb 17                   jmp    44 <_ZN1DC1E1C+0x42>
  2d:   48 89 c7                mov    %rax,%rdi
  30:   48 8d 05 00 00 00 00    lea    0x0(%rip),%rax
            33: R_X86_64_PC32   _ZTC1C0_1B+0x14
  37:   48 89 04 24             mov    %rax,(%rsp)
  3b:   89 44 24 08             mov    %eax,0x8(%rsp)
  3f:   e8 00 00 00 00          callq  44
            40: R_X86_64_PLT32  _Unwind_Resume-0x4
  44:   48 83 c4 18             add    $0x18,%rsp
  48:   c3                      retq   

The code from offset 0x2d through the callq at 0x3f is the exception handling path, generated for when an exception happens during the call of f(c). The lea instruction at 0x30 is referencing an entry in the construction vtable for B-in-C (_ZTC1C0_1B).

Without that inline asm, GCC would have generated the same code as clang, with no exception handling path and no construction vtable necessary.

Compile with --no-exceptions, and the problem also goes away.

I see the same problem whether compiling a.cpp at -O0, -O1, -O2, or -O3.

At least GCC is consistent when compiling a.cpp and main.cpp, so it could be argued that this case simply isn't covered by the C++ ABI, and GCC and Clang are free to treat it differently. I made some trivial attempts to reproduce with something other than inline asm, but could not.

As for why you're getting an error from gold, but not from bfd or lld, gold is reporting what could have been a real error, though in this particular case, since an exception could never have been thrown, the exception handling code would never have executed. But when you link with bfd ld or lld, the lea instruction at 0x30 is left unrelocated, with no warning, and the program could conceivably crash in the case of an exception being thrown during the call to f().

Cary Coutant
  • 606
  • 3
  • 7