Optimization of virtual function calls in derived class

Question

What is the best way to call virtual functions in a derived class so that the compiler can inline or otherwise optimize the call?

Example:

class Base {
  virtual void foo() = 0;
};

class Derived: public Base {
  virtual void foo() {...}
  void bar() {
    foo();
  }
};

I want the call to foo() in bar() to always call Derived::foo(). It is my understanding that the call will result in a vtable lookup and the compiler cannot optimize it out since there may be another class inheriting from Derived.

I could explicitly call Derived::foo() but that gets verbose if there are many virtual function calls in Derived. I also find it surprising that I could not find much material online addressing what seems to me to be a common case (a 'final' derived class calling virtual methods) so I wonder if I am misusing virtual functions here or excessively optimizing.

How should this be done? Stop prematurely optimizing and stick with foo(), suck it up and use Derived::foo(), or is there a better way?

If you know nothing will override `foo()`, then declare it `final`. See: http://stackoverflow.com/questions/8824587/what-is-the-purpose-of-the-final-keyword-in-c11 — Mysticial, May 15 '14 at 18:41
Have you measured both to see if the compiler has actually optimized already for you? — merlin2011, May 15 '14 at 18:45
Oh, fantastic. `final` is exactly what I want. I didn't know it existed in C++. — Eric Langlois, May 15 '14 at 18:47
I don't think that final gives you more optimization possibilities. The compiler still has to figure out if there is another class in the hierarchy which overrides foo, it just says that there is no class down in the hierarchy. If the compiler can do this for a final method and devirtualize it, it should also be possible without the final. — Jens, May 15 '14 at 18:51
@Jens Not quite. If it's `final` it means that it cannot be overridden. Therefore if the compiler can remove the virtual dispatch and optionally inline it if it knows the type is `Derived` or a subclass of `Derived`. — Mysticial, May 15 '14 at 18:57
@Jens, you never need to worry about parents implementing `foo` because the one in `Derived` overrides all of them. Providing `final` means you don't have to worry about any child classes either. That makes it trivial for the compiler to know what to do. — Mark Ransom, May 15 '14 at 18:59
@Mysticial Yes, but the compiler still has to find out if there is no other class between Parent and Derived and if there is it has to compute the dynamic type statically. final only cuts down the possibilities by ensuring that there is no class down the hierarchy, but that is very to find without it as the compiler knows which classes are defined. — Jens, May 15 '14 at 19:20
@MarkRansom If there is more than one class overriding foo, the compiler has to statically determine the dynamic type to select the correct override. How would final help in this case? Consider three classes A < B < C where B and C override foo and C is final. What information does final give when the compiler sees a call to foo through a A*? — Jens, May 15 '14 at 19:22
It's actually more common than that. Consider this extremely common case: `Derived *x = ...; x->foo();` That's inlineable - regardless of where the pointer came from. — Mysticial, May 15 '14 at 19:22
Yes because here it is easy to prove that dynamic type and static are the same. But that can be done without final when the compiler knows which classes are in the program. So what does final help in this case? And what would it help when you have a base-class pointer and more than one class? Maybe even two final ones inheriting from Base? — Jens, May 15 '14 at 19:24
@Jens, Without `final`, the pointer could be a derived type that overrides `foo()`. Therefore the compiler can't inline it. Devirtualization is one of the reasons why `final` was added to the standard. — Mysticial, May 15 '14 at 19:26
@Jens the compiler cannot know if there is a class in a separate module that overrides Base if `final` is not specified since the separate modules are connected at link time not at compile time. With `final` specified the compiler knows that even separate modules cannot override the class/function. — YoungJohn, May 15 '14 at 19:27
@Jens In any case. [Here's the GCC bug that discusses using `final` for devirtualizaion.](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49488) — Mysticial, May 15 '14 at 19:31
Ok, I can see why it helps compilers when you have different modules. Did not think of that. But I would compile my program with IPO or whole-program anaylsis enabled, then it could figure it out without final ... — Jens, May 15 '14 at 19:33
@Jens obviously a call through `A*` needs to be virtual, but the case we're discussing here is when you want to specifically call `C::foo`, which you can only do through a `C*` pointer. Without `final` the compiler doesn't know if there's a class `D` or not. — Mark Ransom, May 15 '14 at 19:33
@Mysticial I still think the intention behind final is to have a diagnostic when you override a final method. The paper n2751 where it is introduced only mentions this ("This is a good attribute because it allows the compiler to emit a message if the class or function is extended. ) and says nothing about optimization. Because it only helps in some very specific cases like the one presented here, I don't think it really usefull for devirtualization. A whole-program analysis probably yields the same or even more information and should do the same optimization. — Jens, May 15 '14 at 19:38
@Jens I imagine there is a situation where you're building a library, and whole program analysis still wouldn't be sufficient for de-virtualization without somehow explicitly informing the compiler. A *load time* optimization, similar to the JVM, would be a way around that though. — Jason, May 16 '14 at 02:47
@Mysticial I realize that there are some cases where final helps. However, I think he advantage is rather small because it only helps when the compiler cannot derive the information on its own. That is basically the case when you have multiple compilation units or libraries. But here, inlining will never take place because inlining happens during compile time and not during linking. So the performance gain is limited to 10% from removing the dynamic dispatch. To really benefit from this, you have to either push inlining to the linking-phase (LTO) or do whole-program analysis. — Jens, May 16 '14 at 06:39
@Jens Inlining can be done at the link phase, but *link time* optimization is fairly expensive. I think I agree, the `final` keyword is less important for de-virtualization than enforcing code structure. Especially in this situation, I think the better thing would be not relying on de-virtualization to perform inlining, and instead using static polymorphism with templates and/or macros, *but...* only, and strictly only, if eliminating *vtable* overhead *and* function call overhead is necessary — Jason, May 16 '14 at 18:50

Eric Langlois · Accepted Answer · 2014-05-16T02:13:13.413

C++11 contains the final keyword, which "specifies that a virtual function can not be overridden in a derived class or that a class cannot be inherited from."1.

It appears that g++ is able to optimize the virtual function call in the derived class if it has been declared final.

I created the following test:

virtualFunctions.h

#pragma once
class Base {
public:
  virtual void foo();
  virtual void bar();
  virtual void baz();
  int fooVar, barVar, bazVar;
};
class Derived: public Base {
public:
  void test();
  virtual void foo();
  virtual void bar();
  virtual void baz() final;
};

virtualFunctions.cpp:

#include "virtualFunctions.h"
void Derived::test() {
  foo();
  Derived::bar();
  baz();
}
void Derived::foo() {
  fooVar = 101;
}
void Derived::bar() {
  barVar = 202;
}
void Derived::baz() {
  bazVar = 303;
}

I am using g++ 4.7.2 and with -O1 the generated assembly contains:

_ZN7Derived4testEv:
.LFB0:
    .loc 1 3 0
    .cfi_startproc
.LVL3:
    pushl   %ebx
.LCFI0:
    .cfi_def_cfa_offset 8
    .cfi_offset 3, -8
    subl    $24, %esp
.LCFI1:
    .cfi_def_cfa_offset 32
    movl    32(%esp), %ebx      ; Load vtable from the stack
    .loc 1 4 0
    movl    (%ebx), %eax        ; Load function pointer from vtable
    movl    %ebx, (%esp)
    call    *(%eax)             ; Call the function pointer
.LVL4:
    .loc 1 5 0
    movl    %ebx, (%esp)
    call    _ZN7Derived3barEv   ; Direct call to Derived::bar()
.LVL5:
    .loc 1 6 0
    movl    %ebx, (%esp)
    call    _ZN7Derived3bazEv   ; Devirtualized call to Derived::baz()

Derived::bar() and Derived::baz() were both called directly, while the vtable was used for foo().

Jens · Answer 2 · 2014-05-15T18:50:17.390

The compiler may be able to optimize it and perform devirtualization if it can statically find out what type is used.

Virtual method calls are quite cheap. Sometime ago I read an article stating that the overhead is roughly ten percent compared to a normal method call. This of course does not consider the missing inlining opportunity.

I also have a feeling that this mixes interface and implementation. I think it would be better to split it into a pure interface and an implementation class.

score 2 · Answer 3 · answered May 15 '14 at 18:45

2

As you yourself say, the performance impact of this should be your concern only in extreme rare cases. If you are compiling as C++11 you can declare Derived and/or foo()/bar() as final and the compiler might inline it.

answered May 15 '14 at 18:45

hllnll

374
1
2
8

score 1 · Answer 4 · answered May 15 '14 at 22:35

1

The answer to the question is to disable dynamic dispatch, and that can be done through qualification:

class Derived: public Base {
  virtual void foo() {...}
  void bar() {
    Derived::foo();          // no dynamic dispatch
  }
};

Now the question is whether this is going to make a difference in performance (measure before changing things!) and whether it makes sense to do this. A virtual function is an extension point for derived types. If you disable dynamic dispatch, someone might create MoreDerived, implement foo and expect that bar calls MoreDerived::foo, but if you disabled dynamic dispatch that won't happen.

Unless there is a really good, measured, reason to try to micro-optimize this, avoid the problem altogether. Chances are that if you run your code in a profiler the dynamic dispatch is not going to show up at all.

answered May 15 '14 at 22:35

David Rodríguez - dribeas

204,818
23
294
489

The difference between dynamic and static dispatch is unlikely to be significant, but the difference between dynamic and *no* dispatch (i.e. inlining) could be. Pixel access for example. – Mark Ransom May 16 '14 at 00:53
@mark: there are cases where it matters, and cases where it doesn't. The latter are far more common than the former, and without measurements I would not do this. Here's the answer, measure with dynamic dispatch and measure without, compare. Also note that there are other alternatives, does the function need to be virtual? Could there be an impl function that is not and is called from both contexts? Disabling dynamic dispatch to a function is prone to cause surprises. – David Rodríguez - dribeas May 16 '14 at 02:08

Optimization of virtual function calls in derived class

4 Answers4