virtual member functions are good or bad for locality in modern CPUs?

Question

Considering the new CPUs with new instructions for moving and new memory controllers, if in C++ I have a vector of Derived objects where Derived is composed of virtual member functions, is this a good or a bad thing for the locality ?

And what if I have a vector of pointers to the base class Base* where I store references to derived objects that are 1-2-3 level up from Base ?

Basically dynamic typing applies to both cases, but which one is better for caching and memory access ?

I have a preference between this 2 but I would like to see a complete answer on the subject.

There is something new to consider as ground-braking from the hardware industry in the last 2-3 years ?

I'm sure deeper pipelining has improved the situation, but virtual functions always require data-dependent jumps which will stall the pipeline until the function pointer is fetched. **Have you tried benchmarking?** — Dietrich Epp, Aug 07 '13 at 16:35
@DietrichEpp nope, because I'm interested on the theory in this case, I'm sure that I can pick one from this 2 even without benchmarking. — user2485710, Aug 07 '13 at 16:40

score 1 · Accepted Answer · answered Aug 07 '13 at 16:40

Storing Derived rather than Base * in a vector is better because it eliminates one extra level of indirection and you have all objects laid out «together» in a continuous memory, which in turn makes life easier for a hardware prefetcher, helps with paging, TLB misses, etc. However, if you do this, make sure you don't introduce a slicing problem.

As for the virtual dispatch in this case, it almost does not matter with an exception of adjustment required for «this» pointer. For example, if Derived overrides a virtual function that you are calling and you already have a pointer to Devied *, then «this» adjustment is not required, and otherwise it should be adjusted to one of the base class`s «this» value (this also depends on size of the classes in inheritance hierarchy).

As long as all classes in a vector have the same overloads, CPU would be able to predict what's going on. However, if you have a mix of different implementations, then CPU would have no clue as to what function will be called for every next object, and that might cause performance issues.

And don't forget to always profile before and after you make changes.

with "slicing problem" you mean "be sure to assign only objects of type Derived to a vector of Derived" right ? nothing else is implicit ? — user2485710, Aug 07 '13 at 16:50
@user2485710: Right, i.e. don't call `push_back()` with an object that itself is derived from `Derived`. But.. there is one thing that just crossed my mind. If you know that all objects are of type `Derived` (in case you have a vector of them, you know for sure), then you do not need to use polymorphism and virtual functions at all... :) — , Aug 07 '13 at 16:53
mmm ... I was proposing this scenario as the opposite to the use of references, but you are probably right, it doesn't make much sense, turns out that I probably don't even care about this because only references are giving a sense to this situation. — user2485710, Aug 07 '13 at 16:55

score 1 · Answer 2 · answered Aug 07 '13 at 16:44

Modern CPU's know how to optimise data-dependent jump instructions, as well as it can for data dependent "branch" instructions - the processor will "learn" that "Last time I went through here, I went THIS way", and if it has enough confidence (gone through several times with the same result) it ill keep going that way.

Of course that doesn't help if the instances are a complete random selection of different classes that each have it's own virtual function.

Cache-locality is of course a slightly different matter, and it really depends on whether you are storing the object instances or the pointers/references to instances in the vector.

And of course, an important factor is "what is the alternative?" - if you are using virtual functions "correctly", it means that there is (at least) one less conditional check in a code-path, because the decision was taken at a much earlier stage. That condition would be (assuming the probability corresponds the same) to the branch probability of the decision, if you solve it by some other method - which will be at least as bad for performance as virtual functions with the same probability (chances are that it's worse, because we now have a if (x) foo(); else bar(); type scenario, so we first have to evaluate x then choose the path. obj->vfunc() will just be unpredictable because fetching for the vtable gives an unpredictable result - but at least the vtable itself is cached.

BTW, I came to a conclusion that on x86_64 Intel CPUs you are better off doing up to 3-4 branches instead of one data-dependent jump. I did some benchmarks myself, and it also turned out that GCC generates up to 3 comparisons to avoid jump (i.e. when implementing a sparse switch statement, from 4.8 you can tweak-and-tune this number from command line as well). — , Aug 07 '13 at 16:48
what do you mean with "if you are using virtual functions correctly" ; also you are basically saying that if all the possible paths are a small number, the difference shouldn't be noticeable ? — user2485710, Aug 07 '13 at 16:55
@VladLazarenko Interesting, I haven't got much experience with x86_64 on Intel, as I'm using AMD processors at home, which is where I do most of my "interesting" work. — Mats Petersson, Aug 07 '13 at 17:59
@user2485710: The number of choices isn't so important as "how often you switch from one to another object". So, if you make some graphics function, where the virtual choice is to draw with OpenGL or DirectX, that choice is probably constant throughout the entire runtime of the code - so the virtual function will have very little penalty. On the other hand if you have something that varies from one to the next call, then the processor will not be able to predict the call. And by "using virtual functions correctly", I mean using them because there are choices to be made in the code... — Mats Petersson, Aug 07 '13 at 18:03

virtual member functions are good or bad for locality in modern CPUs?

2 Answers2