Let's say you have a call to a method that calculates a value and returns it :
double calculate(const double& someArg);
You implement another calculate method that has the same profile as the first one, but works differently :
double calculate2(const double& someArg);
You want to be able to switch from one to the other based on a boolean setting, so you end up with something like this :
double calculate(const double& someArg)
{
if (useFirstVersion) // <-- this is a boolean
return calculate1(someArg); // actual first implementation
else
return calculate2(someArg); // second implementation
}
The boolean might change during runtime but it is quite rare.
I notice a small but noticeable performance hit that I suppose is due to either branch misprediction or cache unfriendly code.
How to optimize it to get the best runtime performances ?
My thoughts and attempts on this issue :
I tried using a pointer to function to make sure to avoid branch mispredictions :
The idea was when the boolean changes, I update the pointer to function. This way, there is no if/else, we use the pointer directly :
The pointer is defined like this :
double (ClassWeAreIn::*pCalculate)(const double& someArg) const;
... and the new calculate method becomes like this :
double calculate(const double& someArg)
{
(this->*(pCalculate))(someArg);
}
I tried using it in combination with __forceinline and it did make a difference (which I am unsure if that should be expected as the compiler should have done it already ?). Without __forceline it was the worst regarding performances, and with __forceinline, it seemed to be much better.
I thought of making calculate a virtual method with two overrides but I read that virtual methods are not a good way to optimize code as we still have to find the right method to call at runtime. I did not try it though.
However, whichever modifications I did, I never seemed to be able to restore the original performances (maybe it is not possible ?). Is there a design pattern to deal with this in the most optimal way (and possibly the cleaner/easier to maintain the better) ?
A complete example for VS :
main.cpp
#include "stdafx.h"
#include "SomeClass.h"
#include <time.h>
#include <stdlib.h>
#include <chrono>
#include <iostream>
int main()
{
srand(time(NULL));
auto start = std::chrono::steady_clock::now();
SomeClass someClass;
double result;
for (long long i = 0; i < 1000000000; ++i)
result = someClass.calculate(0.784542);
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> diff = end - start;
std::cout << diff.count() << std::endl;
return 0;
}
SomeClass.cpp
#include "stdafx.h"
#include "SomeClass.h"
#include <math.h>
#include <stdlib.h>
double SomeClass::calculate(const double& someArg)
{
if (useFirstVersion)
return calculate1(someArg);
else
return calculate2(someArg);
}
double SomeClass::calculate1(const double& someArg)
{
return asinf((rand() % 10 + someArg)/10);
}
double SomeClass::calculate2(const double& someArg)
{
return acosf((rand() % 10 + someArg) / 10);
}
SomeClass.h
#pragma once
class SomeClass
{
public:
bool useFirstVersion = true;
double calculate(const double& someArg);
double calculate1(const double& someArg);
double calculate2(const double& someArg);
};
(I did not include the ptr to function in the example since it only seems to make things worse).
Using the example above, I get an average of 14,61s to run it when calling directly calculate1 in the main, whereas I get an average of 15,00s to run when calling calculate0 (with __forceinline, which seems to make the gap smaller).