I have a class with an enum
member variable. One of the member functions bases its behavior on this enum
so as a "possible" optimization, I have the two different behaviors as two different functions and I give the class a member function pointer which is set at construction. I simulated this situation like this:
enum catMode {MODE_A, MODE_B};
struct cat
{
cat(catMode mode) : stamp_(0), mode_(mode) {}
void
update()
{
stamp_ = (mode_ == MODE_A) ? funcA() : funcB();
}
uint64_t stamp_;
catMode mode_;
};
struct cat2
{
cat2(catMode mode) : stamp_(0), mode_(mode)
{
if (mode_ = MODE_A)
func_ = funcA;
else
func_ = funcB;
}
void
update()
{
stamp_ = func_();
}
uint64_t stamp_;
catMode mode_;
uint64_t (*func_)(void);
};
And then I create a cat object and an array of length 32
. I traverse the array to bring it into cache, then I call cats update method 32
times and store the latency using rdtsc
in the array...
Then I call a function which loops several hundred times using rand()
, ulseep()
, and some arbitrary strcmp()
..come back and I do the 32
thing again.
The result is that the method with the branch seems to always be around 44
+/- 10
cycles whereas the one with the function pointer tends to be around 130
. I'm curious as to why this would be the case?
If anything, I would have expected similar performance. Also, templating is hardly an option because full specialization of the real cat class for that one function would be overkill.