13

My understanding is that virtual functions can cause performance problems because of two issues: the extra derefencing caused by the vtable and the inability of compilers to inline functions in polymorphic code.

What if I downcast a variable pointer to its exact type? Are there still any extra costs then?

class Base { virtual void foo() = 0; };
class Derived : public Base { void foo() { /* code */} };

int main() {
    Base * pbase = new Derived();
    pbase->foo(); // Can't inline this and have to go through vtable
    Derived * pderived = dynamic_cast<Derived *>(pbase);
    pderived->foo(); // Are there any costs due to the virtual method here?
}

My intuition tells me that since I cast the object to its actual type, the compiler should be able to avoid the disadvantages of using a virtual function (e.g., it should be able to inline the method call if it wants to). Is this correct?

Can the compiler actually know that pderived is of type Derived after I downcast it? In the example above its trivial to see that pbase is of type Derived but in actual code it might be unknown at compile time.

Now that I've written this down, I suppose that since the Derived class could itself be inherited by another class, downcasting pbase to a Derived pointer does not actually ensure anything to the compiler and thus it is not able to avoid the costs of having a virtual function?

Juarrow
  • 2,232
  • 5
  • 42
  • 61
Kevin Salvesen
  • 293
  • 1
  • 13
  • Hmm, if I'm not mistaken, could this also be related to `` and `typeid()`? – CinchBlue Jul 10 '15 at 10:32
  • 1
    See this question: http://stackoverflow.com/questions/28332918/virtual-function-efficiency-and-the-final-keyword – Beta Carotin Jul 10 '15 at 10:36
  • 1
    How does the compiler know that there aren't further classes derived from `Derived` and that `pderived` could be pointing to a non-final subobject? – Kerrek SB Jul 10 '15 at 11:09
  • @klodo Aren't you now adding work by causing the compiler to try to figure out if `pbase` is castable to `pderived`? You're potentially making the call to `foo()` cheaper at the expense of definitely doing a `dynamic_cast`. `dynamic_cast` isn't free.... – Andre Kostur Jul 10 '15 at 14:28
  • You're right. But, my idea was to call `dynamic_cast` once, while having `foo()` being called in a loop (which is performance critical), in which case the cost of the `dynamic_cast` would be completely negligible. – Kevin Salvesen Jul 10 '15 at 14:32

3 Answers3

21

There's always a gap between what the mythical Sufficiently Smart Compiler can do, and what actual compilers end up doing. In your example, since there is nothing inheriting from Derived, the latest compilers will likely devirtualize the call to foo. However, since successful devirtualization and subsequent inlining is a difficult problem in general, help the compiler out whenever possible by using the final keyword.

class Derived : public Base { void foo() final { /* code */} }

Now, the compiler knows that there's only one possible foo that a Derived* can call.

(For an in-depth discussion of why devirtualization is hard and how gcc4.9+ tackles it, read Jan Hubicka's Devirtualization in C++ series posts.)

Pradhan
  • 16,391
  • 3
  • 44
  • 59
  • Thanks for the answer and the links. Sadly I cannot use C++11 (and thus final) yet because of compability reasons. I will use a workaround solution that doesn't rely on inheritance for the moment. – Kevin Salvesen Jul 10 '15 at 11:26
  • @Klodo: If you're using an older compiler (not just pre-C++14 but even pre-C++11) then don't expect too much from the optimizer either. – MSalters Jul 10 '15 at 11:50
  • @Klodo: Note that if *you* are sure of the type, you can use `derived->Derived::foo()`; it's a maintenance drag though, and the performance penalty is usually small enough that it's lost in the noise. – Matthieu M. Jul 11 '15 at 10:54
5

Pradhan's advice to use final is sound, if changing the Derived class is an option for you and you don't want any further derivation.

Another option directly available to specific call sites is prefixing the function name with Derived::, inhibiting virtual dispatch to any further override:

#include <iostream>

struct Base { virtual ~Base() { } virtual void foo() = 0; };

struct Derived : public Base
{
    void foo() override { std::cout << "Derived\n"; }
};

struct FurtherDerived : public Derived
{
    void foo() override { std::cout << "FurtherDerived\n"; }
};

int main()
{
    Base* pbase = new FurtherDerived();
    pbase->foo(); // Can't inline this and have to go through vtable
    if (Derived* pderived = dynamic_cast<Derived *>(pbase))
    {
        pderived->foo();  // still dispatched to FurtherDerived
        pderived->Derived::foo();  // static dispatch to Derived
    }
}

Output:

FurtherDerived
FurtherDerived
Derived

This can be dangerous: the actual runtime type might depend on its overrides being called to maintain its invariants, so it's a bad idea to use it unless there're pressing performance problems.

Code available here.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
3

De-virtualization is, actually, a very special case of constant propagation, where the constant propagated is the type (physically represented as a v-ptr in general, but the Standard makes not such guarantee).


Total devirtualization

There are multiple situations where a compiler can actually devirtualize a call that you may not think about:

int main() {
    Base* base = new Derived();
    base->foo();
}

Clang is able to devirtualize the call in the above example simply because it can track the actual type of base as it is created in scope.

In a similar vein:

struct Base { virtual void foo() = 0; };
struct Derived: Base { virtual void foo() override {} };

Base* create() { return new Derived(); }

int main() {
    Base* base = create();
    base->foo();
}

while this example is slightly more complicated, and the Clang front-end will not realize that base is necessarily of type Derived, the LLVM optimizer which comes afterward will:

  • inline create in main
  • store a pointer to the v-table of Derived in base->vptr
  • realize that base->foo() therefore is base->Derived::foo() (by resolving the indirection through the v-ptr)
  • and finally optimize everything out because there is nothing to do in Derived::foo

And here is the final result (which I assume needs no comment even for those not initiated to the LLVM IR):

define i32 @main() #0 {
  ret i32 0
}

There are multiple instances where a compiler (either front-end or back-end) can devirtualize calls in situations that might not be obvious, in all cases it boils down to its ability to prove the run-time type of the object pointed to.


Partial devirtualization

In his serie about improvements to the gcc compiler on the subject of devirutalization Jan Hubička introduces partial devirtualization.

The latest incarnations of gcc have the ability to short-list a few likely run-time types of the object, and especially produce the following pseudo-code (in this case, two are deemed likely, and not all are known or likely enough to justify a special case):

// Source
void doit(Base* base) { base->foo(); }

// Optimized
void doit(Base* base) {
    if (base->vptr == &Derived::VTable) { base->Derived::foo(); }
    else if (base->ptr == &Other::VTable) { base->Other::foo(); }
    else {
        (*base->vptr[Base::VTable::FooIndex])(base);
    }
}

While this may seem slightly convoluted, it does offer some performance gains (as you'll see from the serie of articles) in case the predictions are correct.

Seems surprising? Well, there are more tests, but base->Derived::foo() and base->Other::foo() can now be inlined, which itself opens up further optimization opportunities:

  • in this particular case, since Derived::foo() does nothing, the function call can be optimized away; the penalty of the if test is less than that of a function call so it's worth it if the condition matches often enough
  • in cases where one of the function arguments is known, or known to have some specific properties, the subsequent constant propagation passes can simplify the inlined body of the function

Impressive, right?


Alright, alright, this is rather long-winded but I am coming to talk about dynamic_cast<Derived*>(base)!

First of all, the cost of a dynamic_cast is not to be underestimated; it might well, actually, be more costly than calling base->foo() in the first place, you've been warned.

Secondly, using dynamic_cast<Derived*>(base)->foo() can, indeed, allow devirtualizing the function call if it gives sufficient information to the compiler to do so (it always gives more information, at least). Typically, this can be either:

  • because Derived::foo is final
  • because Derived is final
  • because Derived is defined in an anonymous namespace and has no descendant redefining foo, and thus only accessible in this translation unit (roughly, .cpp file) and so all its descendants are known and can be checked
  • and plenty of other cases (like pruning the set of potential candidates in the case of partial devirtualization)

If you really wish to ensure devirtualization, though, final applied either on the function or class is your best bet.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • @Klodo: This should be the accepted answer. Mine has almost no details and yet ends up at the top. Please switch acceptance to this one. – Pradhan Jun 11 '16 at 16:08
  • @Pradhan: Maybe, maybe not. I do appreciate the intention, but the votes are clear. Your answer, probably because it is short and to the point, was much better received (21 upvotes, compared to 3 for mine) and proved useful enough to the OP that (s)he accepted it. More information does not necessarily make a better answer :) – Matthieu M. Jun 12 '16 at 11:11