Cost of Polymorphic calls - C++

Question

I'm writing a game in C++ that has about 30 different roles that are each slightly different. I have a main class User that contains all of the data required by all of the roles. My first implementation involved just having an enumeration of 30 roles and handling appropriately, but now I'm wondering if it would be better to have User as a base class and each role being its own class that inherits from User.

My main concern is how efficient are polymorphic method calls when there are 30+ classes inheriting from a single base class? I know polymorphic calls involve pointers in a virtual table but I'm not sure if that means a linear search through the entire table for the right method or if it can be done in constant time.

If anyone could comment on the efficiency of polymorphic method calls with a many inherited classes I would appreciate the enlightenment.

Thanks in advance!

I suppose, they should be done in constant time. Even if you would have 30 nested levels. That's compile-time type-checking is designed for. — kirilloid, Apr 15 '12 at 23:31
I found another really interesting article about Polymorphism, Virtual Functions and Dynamic Binding "Under the Hood": http://www.gibmonks.com/c_plus/ch13lev1sec7.html Quote: "Optimizing compilers normally generate polymorphic code that runs as efficiently as hand-coded switch-based logic. The overhead of polymorphism is acceptable for most applications. But in some situations real-time applications with stringent performance requirements, for example the overhead of polymorphism may be too high." So you should be fine in your case but it's fun to know how it really works. — ForceMagic, Apr 17 '12 at 08:40

score 8 · Answer 1 · answered Apr 15 '12 at 23:32

8

Rather than using inheritance, use composition. Just give your 'User' class an object that does the role. So you may end up with 30 'Role' classes. But they don't inherit off 'User', but are given to 'User' to use ( they may inherit off their own base abstract class to define the interface of 'Role')

or if it is just one function.... you might just model it as a bunch of function objects and then just pass those to User.

answered Apr 15 '12 at 23:32

Keith Nicholas

43,549
15
93
156

I hadn't thought about that. That may be a better method. Thanks! – Josh Brittain Apr 15 '12 at 23:35
it's one of Object Oriented Designs 'rule of thumb'. Basically try and compose your objects together to gain variations. ie, think about plugging things together to gain variations of functionality, rather than inheriting and extending to get variations of functionality – Keith Nicholas Apr 15 '12 at 23:39
inheritance is more for saying an object can be used to be plugged into a particular "contract / interface" – Keith Nicholas Apr 15 '12 at 23:41

Luchian Grigore · Accepted Answer · 2012-04-15T23:32:06.710

7

The cost is negligeable. It doesn't matter how many classes you have, or how many levels of inheritance, the cost of a polymorphic call is a simple addition - the pointer to the vftable plus the offset of the specific function (not standard mandated, but on most, if not all, implementations this is correct).

edited Apr 15 '12 at 23:32

answered Apr 15 '12 at 23:31

Luchian Grigore

253,575
64
457
625

1

+1. Although this could theoretically impact CPU cache performance, no? – Cameron Apr 15 '12 at 23:33
@Luchian: Never mind, after thinking it over a bit I can't see how it would ;-) – Cameron Apr 15 '12 at 23:36
I remember form my university course that the lecturer said he made an app that was an order of magnitude slower due to polymorphic calls. Does anyone have any hard data? – Lukasz Madon Apr 15 '12 at 23:36
I doubt that too since he wanted me to change for(int i... i++) to ++i to be more efficient :) Just wonder how a polymorphic call would impact performance critical code. – Lukasz Madon Apr 15 '12 at 23:49
You wrote it's negligible. If something is called billions of time a one asm instruction matters. – Lukasz Madon Apr 15 '12 at 23:53
@lukas if something is called billions of times, the overhead doesn't come from the polymorphic call. – Luchian Grigore Apr 15 '12 at 23:54
3

@lukas You can always find your own test case that proves your point. If you want to prove that virtual dispatch is expensive you just need to create the two types, and a single virtual function that does nothing (say return an int stored as a data member in the derived type). Call that in a tight loop and you will see that the virtual dispatch is as expensive as the rest of the function (probably even more). The question is, when you need it, how does it compare with the alternatives? How much does it cost in a real scenario (where the functions perform some work)... – David Rodríguez - dribeas Apr 16 '12 at 00:04
@Cameron Actually this answer is incorrect. You had a good thought about the cache performance. See my answer below. – ForceMagic Apr 17 '12 at 02:57
@LuchianGrigore I thought it would be a good thing, as you told me, to actually run a sample and look at the assembly code. So I edited my post in order to add the sample and the assembly output. Then we can really see what happens. - Regards – ForceMagic Apr 18 '12 at 02:24

score 0 · Answer 3 · edited May 23 '17 at 12:10

Unfortunately, the answer of Luchian Grigore is incorrect.

It would be correct in the case of Diamond inheritance. Where you have a class D that inherits from B and C which themselves inherits from A. If you call a function of D, there will be an addition operation on the vtable to lookup at D reference.

 (e.g)                   in Memory
                          -------        Classes hierarchy
                     ---> |  A  |                A
the offset here      |    -------               / \
is the pointer   ->  |    | ... |              B   C    <- Diamond Inheritance
addition he is       |    -------              \   /        (usually very bad)
talking about        ---> |  D  |                D
                          -------

In your case, for a polymorphic call the compiler will have to do a lookup (read) on the vtable in order to get the proper function you're calling. When it gets worse is when the data where the vtable is pointing to is not cached. In that case you are having a cache miss, which is very expensive.

Moreover, there is one vtable per class and each object of that class share the same vtable. See In C++, what’s a vtable and how does it work?

The real answer to your question depends on how many times you will swap between every of those 30 classes and how often you will use them. Will it be used in a loop?

The solution you are describing is often used to work around this potential problem. But in fact, it might be as fast as the polymorphic call depending how it is used.

So in general you can go for the Polymorphic method without worrying about cost, because it will be negligeable. Write clean and maintainable code first, optimize later. :)

EDIT :

Since there was a discussion about the actual compiler code on this thread, I decided to run a sample to find out what's happening for real.

Below you can see the sample of code I used.

#include <stdlib.h>
#include <stdio.h>

class I
{
public:
    virtual void f(void) = 0;
};

class A : public I
{
public:
    void f(void)
    {
        printf("A\n");
    }
};


int main(int argc, char* argv[])
{
    __asm
    {
        int 3
    }

    A* pA = new A();

    __asm
    {
        nop
        nop
    }

    pA->f();

    __asm
    {
        nop
        nop
    }

    A a;
    a.f();

    __asm
    {
        nop
        nop
    }

    return 0;
}

Then, you can see the actual assembly code of the sample (how the compiler did interpret it).

int main(int argc, char* argv[])
{
    __asm
    {
        int 3
010E1010  int         3    
    }

    A* pA = new A();
010E1011  push        4    
010E1013  call        operator new (10E10A4h) 
010E1018  add         esp,4 
010E101B  test        eax,eax 
010E101D  je          main+17h (10E1027h) 
010E101F  mov         dword ptr [eax],offset A::`vftable' (10E2104h)
010E1025  jmp         main+19h (10E1029h) 
010E1027  xor         eax,eax 

    __asm
    {
        nop
010E1029  nop              
        nop
010E102A  nop              
    }

    pA->f();
010E102B  mov         edx,dword ptr [eax] 
010E102D  mov         ecx,eax 
010E102F  mov         eax,dword ptr [edx] 
010E1031  call        eax  

    __asm
    {
        nop
010E1033  nop              
        nop
010E1034  nop              
    }

    A a;
    a.f(); //Polymorphic call
010E1035  push        offset string "A\n" (10E20FCh) 
010E103A  call        dword ptr [__imp__printf (10E20ACh)]
010E1040  add         esp,4 

    __asm
    {
        nop
010E1043  nop              
        nop
010E1044  nop              
    }

    return 0;
010E1045  xor         eax,eax 
}

class A : public I
{
public:

    void f(void)
    {
        printf("A\n");
010E1000  push        offset string "A\n" (10E20FCh) 
010E1005  call        dword ptr [__imp__printf (10E20ACh)] 
010E100B  pop         ecx  
    }

This is actually wrong. There is no lookup, the compiler knows exactly where the method is. The addition I was talking about was vftable pointer + sequence number of virtual function. Also, if there's a cache miss, my answer still stands, as it means the virtual call is not repeated enough times to make a difference. — Luchian Grigore, Apr 17 '12 at 03:26
@LuchianGrigore We cannot assume that there is no lookup because it depends on how the code is written. If he is not using explicit calls like B::f(), the compiler may not be able to determine that this class does or does not override function from the base class. Thus, a table lookup will occurs. I did not denied that the addition is happening, I'm saying that something important was missing. — ForceMagic, Apr 17 '12 at 03:55
I urge you to look in the binary code. Or actually, read more on how polymorphism is implemented. There's no lookup. — Luchian Grigore, Apr 17 '12 at 06:37
@LuchianGrigore I have no choice to presume that you are talking about Static Polymorphism. Unless OP uses Static Polymorphism with CRTP for instance, Dynamic Polymorphism may occurs and so a lookup will occurs too. If you are still not conviced, you could read about Polymorphic Inline Caching, Curiously Recurring Template Pattern and compiler optimization regarding vtable lookup. - Regards. — ForceMagic, Apr 17 '12 at 07:32
No, I'm talking about dynamic polymorphism. Still no vtable lookup. Regards. — Luchian Grigore, Apr 17 '12 at 07:37
Please provide a link or some documentation if you can, before assuming another answer is wrong. — Luchian Grigore, Apr 17 '12 at 07:40
CRTP: http://en.wikipedia.org/wiki/Curiously_recurring_template_pattern Vtable: http://en.wikipedia.org/wiki/Virtual_method_table Another relevant thread from our community: http://stackoverflow.com/questions/9907004 As you can see lookup arise and there are multiple concepts to workaround that if needed. — ForceMagic, Apr 17 '12 at 07:44
From your links, the lookup occurs where JIT compilation is enabled (not common). Also, to quote the links - "A virtual call requires at least an extra indexed dereference, and sometimes a "fixup" addition" - exactly what I said. — Luchian Grigore, Apr 17 '12 at 07:49
Read further on: "Or the compiler (or optimizer) may be able to detect that there are no subclasses of B1 anywhere in the program that override f1. The call to B1::f1 or B2::f2 will probably not require a vtable lookup because the implementation is specified explicitly (although it does still require the 'this'-pointer fixup)." So IF the compiler can optimize the cost will Only be the pointer-fixup, but if he cannot a lookup will occurs and a cache miss is possible. — ForceMagic, Apr 17 '12 at 07:53
Read further on : "Furthermore, in environments where JIT compilation is [NOT] in use, virtual function calls usually cannot be inlined. [...] To avoid this overhead, compilers usually avoid using vtables whenever the call can be resolved at compile time." etc. — ForceMagic, Apr 17 '12 at 07:55
Lookup is used incorrectly there. The runtime does not look inside the table to find the appropriate function. — Luchian Grigore, Apr 17 '12 at 07:57

Cost of Polymorphic calls - C++

3 Answers3

EDIT :

Linked