3

Update: This seems to be specific to gcc < 4.9, which improved devirtualization.


On a polymorphic class hierarchy I have a non-virtual interface function (let's call her algorithm) that calls some (here for simplicity just one) virtual method (call it customization) of the class. When this algorithm is invoqued on a specific instance of a derived class, vtable calls would not be needed, because the exact type is known.

Now I've ended up having such a call at an inner loop and would like to have it inlined. How can this be achieved?

struct Base 
{
    inline virtual void customization(int i) {};
    inline void algorithm(int i) { customization(i); }
};

struct Derived : public Base
{
    // Adding this to every derived class instead of just the base will help, 
    // but is the kind of duplication I would like to avoid:
    // inline void algorithm(int i) { customization(i); }
    inline void customization(int i) override final { helper(i) };
};

void my_main()
{
    // Note that these are concrete instances, not references ..
    Derived d1, d2; 
    d1.algorithm(111);
    d2.algorithm(222);

    // ... which could just be optimized into the following, ...
    helper(111);
    helper(222);

    // ... but instead vtable calls to Base::algorithms result, behaving just as if it was
    static_cast<Base&>(d1).algorithm(111);
    static_cast<Base&>(d2).algorithm(222);
}

The following snippets from this godbolt example illustrates that.

Actual result:

enter image description here

Desired result:

enter image description here

mbschenkel
  • 1,865
  • 1
  • 18
  • 40
  • 2
    This is a compiler quality issue. Newer versions of (non-PowerPC) gcc have no problem with this code: https://godbolt.org/z/Xviats. If you want to ask this question specifically in the context of PowerPC, I would suggest the [tag:powerpc] tag. – Max Langhof Jan 21 '20 at 16:15
  • 2
    PPC clang doesn't seem to have an issue either, as far back 3.4.2: https://pastebin.com/vRZmfDjg – Braaedy Jan 21 '20 at 16:57
  • 2
    @MaxLanghof Does not seem to be PPC specific though, but rather caused by the old gcc version, before 4.9 it is also not devirtualized in x86. Probably related to ["New type inheritance analysis module improving devirtualization. Devirtualization now takes into account anonymous name-spaces and the C++11 final keyword"](https://gcc.gnu.org/gcc-4.9/changes.html). – mbschenkel Jan 21 '20 at 16:59
  • 1
    @mbschenkel Why not post that as an answer? :) – Max Langhof Jan 21 '20 at 17:09
  • @MaxLanghof: Unfortunately I'm bound to that compiler for the moment, so it does not solve my problem yet, although it certainly brought me one step further. Thanks! – mbschenkel Jan 21 '20 at 17:11
  • @mbschenkel I hate to ask it, but have you actually profiled to confirm that this is a relevant optimization opportunity? Does devirtualizing the call by hand make a meaningful difference in observed speed (in a realistic workload)? – Max Langhof Jan 21 '20 at 17:20

0 Answers0