4

I wrote the following program to test how much virtual functions cost on my machine:

#include <iostream>
#include <ctime>
#define NUM_ITER 10000000000
//   5 seconds = 1000000000

static volatile int global_a;

void spin()
{
    int a = global_a;
    int b = a*a;
    int c = a+5;
    int d = a^b^c;
    global_a = b*d;
}

struct A {
    virtual void a() = 0;
};

struct B : A {
    virtual void a() { spin(); }
};

struct C : A {
    virtual void a() { spin(); }
};

void run_A1(A* a)
{
    a->a();
}

void run_A(A* a)
{
    for (long long i = 0; i < NUM_ITER; i++) {
        run_A1(a);
    }
}

void run()
{
    for (long long i = 0; i < NUM_ITER; i++) {
        spin();
    }
}

int main()
{
    global_a = 2;

    A* a1 = new B;
    A* a2 = new C;

    std::clock_t c_begin, c_end;

    c_begin = std::clock();
    run_A(a1);
    c_end = std::clock();

    std::cout << "Virtual | CPU time used: "
              << 1000.0 * (c_end-c_begin) / CLOCKS_PER_SEC
              << " ms\n";

    c_begin = std::clock();
    run_A(a2);
    c_end = std::clock();

    std::cout << "Virtual | CPU time used: "
              << 1000.0 * (c_end-c_begin) / CLOCKS_PER_SEC
              << " ms\n";

    c_begin = std::clock();
    run();
    c_end = std::clock();

    std::cout << "Normal  | CPU time used: "
              << 1000.0 * (c_end-c_begin) / CLOCKS_PER_SEC
              << " ms\n";

    delete a1;
    delete a2;
}

The results were opposite than I expected: the virtual functions were consistently faster. For example, this is one of the outputs I got with NUM_ITER = 10000000000:

Virtual | CPU time used: 49600 ms
Virtual | CPU time used: 50270 ms
Normal  | CPU time used: 52890 ms

From the analysis of the resulting assembler file I can confirm that the compiler hasn't optimized out anything important. I've used GCC-4.7 with the following options:

g++ -O3 -std=c++11 -save-temps -masm=intel -g0 -fno-exceptions -fno-inline test.cc -o test

Why are the virtual function calls faster? Or why are the non-virtual function calls slower? Have the branch predictors become so good? Or maybe it's just my machine. Maybe someone could also test and post his timings?

  • 1
    I can't reproduce this on ideone. – Pubby May 27 '12 at 15:45
  • 2
    MSVS show a clear advantage for the non-virtual call also. – Luchian Grigore May 27 '12 at 15:47
  • @Pubby: Ideone is not the best option to test this as they run many programs on their servers at a time. –  May 27 '12 at 15:47
  • `clock` is a measure of CPU ticks, not a measure of time. – Richard J. Ross III May 27 '12 at 15:47
  • @RichardJ.RossIII But ticks/`CLOCK_PER_SEC` is the CPU time used. –  May 27 '12 at 15:48
  • 1
    @jons34yp you are mistaken. Take a call to `sleep(1)` plus a few `clock()` calls and you will see that many things can make timings judged by `clock()` invalid. – Richard J. Ross III May 27 '12 at 15:49
  • I can't reproduce this on OSX. (though without the C++11 flag.) When I try, non-virtual calls are clearly faster. – Gort the Robot May 27 '12 at 15:52
  • @RichardJ.RossIII But the program _isn't_ running during `sleep(1)` - the CPU time is given back to the OS. `clock()` works as expected here. Anyway, my example does not use `sleep()`. –  May 27 '12 at 15:53
  • I think the generated assembly would reveal more, if there really is a *speed issue* and not a *timing issue*. – Necrolis May 27 '12 at 15:58
  • The culprit seems to be `-fno-inline` option, because if I remove this, then the non-virtual function runs faster (as expected). Only in the presence of `-fno-inline`, non-virtual function runs slower. I tested this on `gcc (GCC) 4.6.1` (MinGW). – Nawaz May 27 '12 at 16:01
  • 1
    Well, that doesn't explain everything because that just means that `spin()` can be inlined while the virtual methods can't. But the non-virtual version should be faster even without inlining. Try running the normal version first instead of last. – Gort the Robot May 27 '12 at 16:03
  • @Navaz: If you remove `-fno-inline`, then the compiler can inline `run_A1` and `run_A` into `main()` and potentially devirtualize all calls to virtual functions. The test doesn`t measure anything then. –  May 27 '12 at 17:51

2 Answers2

4

Try reseting global_a before each call to run():

void run()
{
    global_a = 2;

    ...
}

void run_A(A *a)
{    
    global_a = 2;

    ...
}

Not sure if this is having any impact, but not all mathematical operations take the same amount of time!

Richard J. Ross III
  • 55,009
  • 24
  • 135
  • 201
  • That seems to be correct reason. It has to do with the value of `global_a` which is different for each case. I reordered the calls, putting the non-virtual calls before the virtual calls, and it turned out to be surprising: the non-virtual calls became faster. That means, it has less to do with virtual vs non-virtual, and more to do with the value of `global_a` : whichever function gets to use `global_a=2` first, it seems to run faster as well. – Nawaz May 27 '12 at 16:12
  • I couldn't affect the performance by reordering. Just to be sure, I added `global_a = 2;` and it doesn't change anything. Nawaz, maybe you additionally changed something other? –  May 27 '12 at 17:44
2

The compiler might be smart enough to see that the virtual functions call a global function spin() and devirtualize them. The calls probably get inlined too.

Check this.

Community
  • 1
  • 1
emsr
  • 15,539
  • 6
  • 49
  • 62
  • You either expressed yourself badly, or don't understand how devirtualization works. Devirtualization happens when compiler knows the actual type of the variable and thus what virtual functions would be called. In this case, virtual functions are called by function `run_A1()`, which can`t know the actual type of its parameter, unless it's inlined. Inlining is disabled by `-fno-inline`. –  May 27 '12 at 17:49
  • @jons - Tricking the compiler is hard. It can theoretically *still* see that the function is only called in one place and always with the same pointer. If we can see it, the compiler can! – Bo Persson May 27 '12 at 18:07