about function pointer: why the overhead time changes when the content of the function changes

Question

here is the c++ code, and I use vs2013, release mode

#include <ctime>
#include <iostream>

void Tempfunction(double& a, int N)
{
    a = 0;
    for (double i = 0; i < N; ++i)
    {
    a += i;
    }
}

int main()
{
    int N = 1000; // from 1000 to 8000

    double Value = 0;
    auto t0 = std::time(0);
    for (int i = 0; i < 1000000; ++i)
    {
        Tempfunction(Value, N);
    }
    auto t1 = std::time(0);
    auto Tempfunction_time = t1-t0;
    std::cout << "Tempfunction_time = " << Tempfunction_time << '\n';

    auto TempfunctionPtr = &Tempfunction;

    Value = 0;
    t0 = std::time(0);
    for (int i = 0; i < 1000000; ++i)
    {
        (*TempfunctionPtr)(Value, N);
    }
    t1 = std::time(0);
    auto TempfunctionPtr_time = t1-t0;
    std::cout << "TempfunctionPtr_time = " << TempfunctionPtr_time << '\n';

    std::system("pause");
}

I change the value of N from 1000 to 8000, and record Tempfunction_time and TempfunctionPtr_time. The results are weird:

N=1000 , Tempfunction_time=1, TempfunctionPtr_time=2;
N=2000 , Tempfunction_time=2, TempfunctionPtr_time=6;
N=4000 , Tempfunction_time=4, TempfunctionPtr_time=11;
N=8000 , Tempfunction_time=8, TempfunctionPtr_time=21;

TempfunctionPtr_time - Tempfunction_time is not constant, and TempfunctionPtr_time = 2~3 * Tempfunction_time. The difference should be a constant which is the overhead of function pointer.

What is wrong?

EDIT:

Assume VS2013 inlines Tempfunction if it it called by Tempfunction(), and does not inline it if it is called by (*TempfunctionPtr), then we can explain the difference. So, if that is true, why can not the compiler inline (*TempfunctionPtr) ?

I saw now that you build it in release mode. I'm sure now that's optimizations. switch off all possible optimizations (i don't know how to do in vs 2013) and try again to see. — Hayri Uğur Koltuk, Feb 06 '14 at 08:54
std::time works with seconds, maybe do you need millisecond precision in the measures. — Narkha, Feb 06 '14 at 08:54
actually you can use QueryPerformanceCounter since you are on Windows. — Hayri Uğur Koltuk, Feb 06 '14 at 08:54
make 'a' and 'i' volatile and test again, also increase N to 10000, this should mitigate the fptr dereferencing — Ezra, Feb 06 '14 at 08:59

score 0 · Answer 1 · answered Feb 06 '14 at 10:08

I compiled the existing code with g++ on my Linux machine, and I found that the time was too short to be measured accurately in seconds, so rewrote it to use std::chrono to measure the time more precisely - I also had to "use" the variable Value (hence the "499500" being printed below), otherwise the compiler would completely optimise away the first loop. Then I get the following result:

Tempfunction_time = 1.47983
499500
TempfunctionPtr_time = 1.69183
499500

Now, the results I have are for GCC (version 4.6.3 - other versions are available and may give other results!), which is not the same compiler as Microsoft, so the results may differ - different compilers optimise code quite differently at times. I'm actually quite surprised that the compiler doesn't figure out that the result of TempFunction only needs calculating once. But hey, made it easier to write the benchmark without trickery.

My second observation is that, with my compiler, if I replaceint N=1000; with a loop for(int N=1000; N <= 8000; N *= 2) around the main code, there is no or very little difference between the two cases - I'm not entirely sure why, because the code looks identical (there is no call via a function-pointer, because the compiler knows that the function pointer is a constant), and TempFUnction gets inlined in both cases. (The same "equality" happens when N is other values than 1000 - so I'm far from sure what is going on here....

To actually measure the difference between a function pointer and direct function call, you would need to move TempFUnction into a separate file, and "hide" the actual value stored in TempFunctionPtr such that the compiler doesn't figure out exactly what you are doing.

In the end, I ended up with something like this:

typedef void (*FunPtr)(double &a, int N);

void Tempfunction(double& a, int N)
{
    a = 0;
    for (double i = 0; i < N; ++i)
    {
    a += i;
    }
}

FunPtr GetFunPtr()
{
    return &Tempfunction;
}

And the "main" code like this:

#include <iostream>
#include <chrono>

typedef void (*FunPtr)(double &a, int N);

extern void Tempfunction(double& a, int N);
extern FunPtr GetFunPtr();

int main()
{
    for(int N = 1000; N <= 8000; N *= 2)
    {
    std::cout << "N=" << N << std::endl;
    double Value = 0;
    auto t0 = std::chrono::system_clock::now();
    for (int i = 0; i < 1000000; ++i)
    {
        Tempfunction(Value, N);
    }
    auto t1 = std::chrono::system_clock::now();;
    std::chrono::duration<double> Tempfunction_time = t1-t0;
    std::cout << "Tempfunction_time = " << Tempfunction_time.count() << '\n';
    std::cout << Value << std::endl;

    auto TempfunctionPtr = GetFunPtr();

    Value = 0;
    t0 = std::chrono::system_clock::now();
    for (int i = 0; i < 1000000; ++i)
    {
        (*TempfunctionPtr)(Value, N);
    }
    t1 = std::chrono::system_clock::now();
    std::chrono::duration<double> TempfunctionPtr_time = t1-t0;
    std::cout << "TempfunctionPtr_time = " << TempfunctionPtr_time.count() << '\n';
    std::cout << Value << std::endl;
    }
}

However, the difference is thousands of a second, and variant is a clear winner, the only conclusion is the obvious one, that "calling a function is slower than inlining it".

N=1000
Tempfunction_time = 1.78323
499500
TempfunctionPtr_time = 1.77822
499500
N=2000
Tempfunction_time = 3.54664
1.999e+06
TempfunctionPtr_time = 3.54687
1.999e+06
N=4000
Tempfunction_time = 7.0854
7.998e+06
TempfunctionPtr_time = 7.08706
7.998e+06
N=8000
Tempfunction_time = 14.1597
3.1996e+07
TempfunctionPtr_time = 14.1577
3.1996e+07

Of course, if we do "only half the hiding trick", so that the function is known and inlineable in the first case, and not known and through a function pointer, we can perhaps expect a difference. But calling a function through a pointer is in itself not expensive. The real difference comes when the compiler decides to inline the function.

Obviously, these are the results of GCC 4.6.3, which is not the same compiler as MSVS2013. You should make the "chrono" modifications that are in the above code, and see what difference it makes.

Thank you. I get similar results on windows. I think vs2013 inlines the Tempfunction for my code posted. And, why inline can give non-constant performance boost? i.e., instead of Tempfunction_time - TempfunctionPtr_time = constant, I get Tempfunction_time = 2 ~ 3 * TempfunctionPtr_time? — liangbright, Feb 06 '14 at 17:32
Another question: is it true that compilers can no inline a function if it is called by a function pointer? (e.g., virtual function) — liangbright, Feb 06 '14 at 18:08
@liangbright The compiler can still inline if it can prove that the method is not overridden. Compilers are getting smarter nowadays. Then can even inline across files now. — Raymond Chen, Feb 06 '14 at 18:17
maybe vs does not inline in such case, just read this : http://stackoverflow.com/questions/16462800/visual-c-not-inlining-simple-const-function-pointer-calls — liangbright, Feb 07 '14 at 01:56
The compiler has no obligation to "not inline" functions as long as the code works correctly with the inlined code. Of course, different compilers have different degrees of "cleverness" here. — Mats Petersson, Feb 07 '14 at 06:48

about function pointer: why the overhead time changes when the content of the function changes

1 Answers1