I compiled the existing code with g++ on my Linux machine, and I found that the time was too short to be measured accurately in seconds, so rewrote it to use std::chrono
to measure the time more precisely - I also had to "use" the variable Value
(hence the "499500" being printed below), otherwise the compiler would completely optimise away the first loop. Then I get the following result:
Tempfunction_time = 1.47983
499500
TempfunctionPtr_time = 1.69183
499500
Now, the results I have are for GCC (version 4.6.3 - other versions are available and may give other results!), which is not the same compiler as Microsoft, so the results may differ - different compilers optimise code quite differently at times. I'm actually quite surprised that the compiler doesn't figure out that the result of TempFunction
only needs calculating once. But hey, made it easier to write the benchmark without trickery.
My second observation is that, with my compiler, if I replaceint N=1000;
with a loop for(int N=1000; N <= 8000; N *= 2)
around the main code, there is no or very little difference between the two cases - I'm not entirely sure why, because the code looks identical (there is no call via a function-pointer, because the compiler knows that the function pointer is a constant), and TempFUnction
gets inlined in both cases. (The same "equality" happens when N is other values than 1000 - so I'm far from sure what is going on here....
To actually measure the difference between a function pointer and direct function call, you would need to move TempFUnction
into a separate file, and "hide" the actual value stored in TempFunctionPtr
such that the compiler doesn't figure out exactly what you are doing.
In the end, I ended up with something like this:
typedef void (*FunPtr)(double &a, int N);
void Tempfunction(double& a, int N)
{
a = 0;
for (double i = 0; i < N; ++i)
{
a += i;
}
}
FunPtr GetFunPtr()
{
return &Tempfunction;
}
And the "main" code like this:
#include <iostream>
#include <chrono>
typedef void (*FunPtr)(double &a, int N);
extern void Tempfunction(double& a, int N);
extern FunPtr GetFunPtr();
int main()
{
for(int N = 1000; N <= 8000; N *= 2)
{
std::cout << "N=" << N << std::endl;
double Value = 0;
auto t0 = std::chrono::system_clock::now();
for (int i = 0; i < 1000000; ++i)
{
Tempfunction(Value, N);
}
auto t1 = std::chrono::system_clock::now();;
std::chrono::duration<double> Tempfunction_time = t1-t0;
std::cout << "Tempfunction_time = " << Tempfunction_time.count() << '\n';
std::cout << Value << std::endl;
auto TempfunctionPtr = GetFunPtr();
Value = 0;
t0 = std::chrono::system_clock::now();
for (int i = 0; i < 1000000; ++i)
{
(*TempfunctionPtr)(Value, N);
}
t1 = std::chrono::system_clock::now();
std::chrono::duration<double> TempfunctionPtr_time = t1-t0;
std::cout << "TempfunctionPtr_time = " << TempfunctionPtr_time.count() << '\n';
std::cout << Value << std::endl;
}
}
However, the difference is thousands of a second, and variant is a clear winner, the only conclusion is the obvious one, that "calling a function is slower than inlining it".
N=1000
Tempfunction_time = 1.78323
499500
TempfunctionPtr_time = 1.77822
499500
N=2000
Tempfunction_time = 3.54664
1.999e+06
TempfunctionPtr_time = 3.54687
1.999e+06
N=4000
Tempfunction_time = 7.0854
7.998e+06
TempfunctionPtr_time = 7.08706
7.998e+06
N=8000
Tempfunction_time = 14.1597
3.1996e+07
TempfunctionPtr_time = 14.1577
3.1996e+07
Of course, if we do "only half the hiding trick", so that the function is known and inlineable in the first case, and not known and through a function pointer, we can perhaps expect a difference. But calling a function through a pointer is in itself not expensive. The real difference comes when the compiler decides to inline the function.
Obviously, these are the results of GCC 4.6.3, which is not the same compiler as MSVS2013. You should make the "chrono" modifications that are in the above code, and see what difference it makes.