32

C++ question here. I have a system where I'm going to have hundreds of mini-subclasses of a given superclass. They all will have a "foo" method that does something. Or... I'm going to have one class with an integer called "type" and use a giant switch statement to decide what to do when I foo.

Performance is a huge consideration here. Extremely important.

The question is, what are the performance benefits/penalties of using a switch statement vs. letting C++ do it via the vftable? If I have it as a switch statement, I can put the commonly occuring foo's up at the top of the switch statement and the less common ones at the bottom, hopefully shortcutting the comparison. Trying to get an effect like this with the vftable is bound to be compiler dependent even if I can figure out how to do it...

On the other hand, my code would be a lot easier to deal with without these ugly switch statements.

eeeeaaii
  • 3,372
  • 5
  • 30
  • 36
  • 3
    Maybe you should consider metaprogramming libs and see if they can solve your problem. – Julio Guerra Dec 17 '10 at 04:13
  • 5
    The C++ working group did a report on C++ performance and addressed concerns about virtual functions. See the link [here](http://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf) – sashang Dec 17 '10 at 04:28
  • 1
    the compiler is free to reorder the statements, as long as the fall-through are preserved, thus you have no guarantee (from the standard) that the most frequent cases would be tested first... – Matthieu M. Dec 17 '10 at 08:07
  • it's no answer to your question, but I wonder if you can't re-design slightly so that you don't have so many calls to so many virtual methods. Often it's possible to have one virtual function that does many calls to the right objects then. – Philipp Dec 17 '10 at 09:15
  • Thanks for the link, sashang. Nice report. – Jan Gray Dec 17 '10 at 15:55
  • Possible duplicate of [dynamical binding or switch/case?](https://stackoverflow.com/questions/2681337/dynamical-binding-or-switch-case) – underscore_d May 24 '17 at 16:13

5 Answers5

18

There's been some research on this topic in the field of virtual machine design. Generally, a switch statement is going to be faster, a lot of virtual machines use switch semantics as opposed to virtual lookup. Theoretically, one would assume that a virtual table - being a constant time algorithm - will be faster, but we have to examine how the hardware sees a virtual table.

A switch statement is easier for the compiler to inline. This is a huge consideration, the actual act of calling a virtual function is minimal, however, pushing and popping the entire stack frame is necessary because the compiler has no idea which function will be called at run-time.

Branch prediction and hardware prefetch should be easier on a switch statement, although modern architectures are getting better at predicting virtual calls.

A lot of code that uses virtual dispatch requires the use of heap based allocation schemes. Dynamic memory allocation is a bottleneck in a lot C++ applications.

Anthony
  • 181
  • 1
  • 2
  • On this topic, Anton Ertl's benchmarks here are spot on: http://www.complang.tuwien.ac.at/forth/threading/ – Arto Bendiken Mar 26 '13 at 22:47
  • 1
    Also, with `switch`, we can have (1) explicit type information (where viable pointer is hidden) (2) easier precise class size control (3) and non-virtual destructor. I think this also will help optimization. – eonil May 18 '14 at 04:25
  • 2
    its false than virtual requires heap. auto objects work all the same. and pooled objects as well. you can override new and make it allocate from a static table even. – v.oddou Apr 09 '15 at 05:48
  • This persistent falsehood that dynamic dispatch requires dynamic allocation baffles me. You just need to pass the object by reference or pointer (or, obviously, call on it directly with its real static type). How the object was allocated is completely irrelevant. I do tonnes of polymorphic stuff with objects allocated on the stack. Oh, wait: the worst thing is that Stroustrup has the same false statement in one of his FAQs. I guess that's where people keep repeating this from. – underscore_d May 24 '17 at 16:02
  • @underscore_d The problem is that most people almost always use the heap for everything in (OOP) C++ because Stroustrup made bad design decisions, and the 90s royally screwed up programming in general with bad OOP practices. The fact that people (including Stroustrup, and myself) are confused on the technical details of DD is just a symptom of the much larger problem. – SO_fix_the_vote_sorting_bug Aug 05 '20 at 21:14
13

If I have it as a switch statement, I can put the commonly occuring foo's up at the top of the switch statement and the less common ones at the bottom, hopefully shortcutting the comparison.

A switch statement is generally compiled to a jump table rather than a block of if-else conditionals as your question implies. In practice, the virtual table and the switch jump table should have similar performance, though test if you're really concerned.

chrisaycock
  • 36,470
  • 14
  • 88
  • 125
  • 3
    +1 for "test and see what's faster". The result is implementation-dependent, so measuring is the only way to find. – sharptooth Dec 17 '10 at 09:21
  • 1
    Jump tables are not used on x86 for as far as I know because branches are very cheap there; ARM still uses them and has specific instructions for dealing with them however. – Jasper Bekkers Aug 24 '11 at 18:27
  • 1
    @JasperBekkers Clang actually lowers switches to jump tables under certain conditions. Paper: https://llvm.org/pubs/2007-05-31-Switch-Lowering.pdf – atlaste Dec 06 '17 at 07:27
5

The compiler determines how the switch statements are handled, but there are a few basic techniques they use.

  1. if-else binary-sort: The comparison is done as a series of if-else but in a binary-sort like fashion, performance is thus comparable to lookup in a map of N items
  2. jump table: if the items are close enough together a table of addresses will be produced. Lookup is then in constant time

Where the case statements are located in the switch statement makes no difference in either case.

Virtual functions have an overhead compared to direct call. It involves an additional offset and pointer lookup. For all but the most extreme performance considerations this cost is negligible. When comparing to a switch the overhead is not in the virtual lookup, but the function call itself. So a switch statement that simply calls functions in each case will perform basically the same as virtual functions.

So essentially the "dispatch semantics" of a switch statement (with jump table) compared to a virtual function call are nearly irrelevant. If all your "foo" methods are relatively small and can be inlined the switch statement will start to perform better. The other advantage of switch is that you can put common code before the switch and get better register/stack optimizations.

However, there is a significant maintenance overhead. This should be your primary concern at this point. Why? Because the performance bottle-neck in your code is not likely the switching login, or even the function calls, but something else. Until you fix that something else there is no point in addressing these low-level performance issues. So stick with whichever provides more maintainable code at the moment.

edA-qa mort-ora-y
  • 30,295
  • 39
  • 137
  • 267
2

To the other answers here I would add two more.

1) It is more difficult and less common for a compiler to perform classic optimizations (including enregistration) across a virtual function call interface than across case labeled statements in a switch statement in a single function.

2) Any performance difference in the dispatch is highly depedendent on the processor's branch prediction hardware. Even a virtual function call target address (and return) may be correctly predicted and have negligible performance overhead in the pipeline of a modern out-of-order processor.

If the performance of this operation really matters, you really have to try it both ways and measure it, in the context of the real system.

Happy hacking!

Jan Gray
  • 3,454
  • 19
  • 15
0

Vtable should be faster in nearly all cases, but if performance is so critical, the right thing to ask is by how much.

Vtable call is triple indirection (three memory accesses to get the target CALL address). Cache misses should not be an issue if there're many calls. So, it is roughly 2-3 switch label comparisons (though the latter offer even less chance for CPU cache miss, but less for pipe usage).

You should of course not rely on anything I said here, and test it all with true performance measurements on your target architecture.

Pavel Radzivilovsky
  • 18,794
  • 5
  • 57
  • 67