De-virtualization is, actually, a very special case of constant propagation, where the constant propagated is the type (physically represented as a v-ptr in general, but the Standard makes not such guarantee).
Total devirtualization
There are multiple situations where a compiler can actually devirtualize a call that you may not think about:
int main() {
Base* base = new Derived();
base->foo();
}
Clang is able to devirtualize the call in the above example simply because it can track the actual type of base
as it is created in scope.
In a similar vein:
struct Base { virtual void foo() = 0; };
struct Derived: Base { virtual void foo() override {} };
Base* create() { return new Derived(); }
int main() {
Base* base = create();
base->foo();
}
while this example is slightly more complicated, and the Clang front-end will not realize that base
is necessarily of type Derived
, the LLVM optimizer which comes afterward will:
- inline
create
in main
- store a pointer to the v-table of
Derived
in base->vptr
- realize that
base->foo()
therefore is base->Derived::foo()
(by resolving the indirection through the v-ptr)
- and finally optimize everything out because there is nothing to do in
Derived::foo
And here is the final result (which I assume needs no comment even for those not initiated to the LLVM IR):
define i32 @main() #0 {
ret i32 0
}
There are multiple instances where a compiler (either front-end or back-end) can devirtualize calls in situations that might not be obvious, in all cases it boils down to its ability to prove the run-time type of the object pointed to.
Partial devirtualization
In his serie about improvements to the gcc compiler on the subject of devirutalization Jan Hubička introduces partial devirtualization.
The latest incarnations of gcc have the ability to short-list a few likely run-time types of the object, and especially produce the following pseudo-code (in this case, two are deemed likely, and not all are known or likely enough to justify a special case):
// Source
void doit(Base* base) { base->foo(); }
// Optimized
void doit(Base* base) {
if (base->vptr == &Derived::VTable) { base->Derived::foo(); }
else if (base->ptr == &Other::VTable) { base->Other::foo(); }
else {
(*base->vptr[Base::VTable::FooIndex])(base);
}
}
While this may seem slightly convoluted, it does offer some performance gains (as you'll see from the serie of articles) in case the predictions are correct.
Seems surprising? Well, there are more tests, but base->Derived::foo()
and base->Other::foo()
can now be inlined, which itself opens up further optimization opportunities:
- in this particular case, since
Derived::foo()
does nothing, the function call can be optimized away; the penalty of the if
test is less than that of a function call so it's worth it if the condition matches often enough
- in cases where one of the function arguments is known, or known to have some specific properties, the subsequent constant propagation passes can simplify the inlined body of the function
Impressive, right?
Alright, alright, this is rather long-winded but I am coming to talk about dynamic_cast<Derived*>(base)
!
First of all, the cost of a dynamic_cast
is not to be underestimated; it might well, actually, be more costly than calling base->foo()
in the first place, you've been warned.
Secondly, using dynamic_cast<Derived*>(base)->foo()
can, indeed, allow devirtualizing the function call if it gives sufficient information to the compiler to do so (it always gives more information, at least). Typically, this can be either:
- because
Derived::foo
is final
- because
Derived
is final
- because
Derived
is defined in an anonymous namespace and has no descendant redefining foo
, and thus only accessible in this translation unit (roughly, .cpp
file) and so all its descendants are known and can be checked
- and plenty of other cases (like pruning the set of potential candidates in the case of partial devirtualization)
If you really wish to ensure devirtualization, though, final
applied either on the function or class is your best bet.