I am trying to test the fastest way to call a function pointer to get around templates for a finite amount of arguments. I wrote this benchmark: https://gcc.godbolt.org/z/T1qzTd
I am noticing that function pointers to class member functions have a lot of added overhead that I am having trouble understanding. What I mean is the following:
With a struct bar and function foo defined as follows:
template<uint64_t r>
struct bar {
template<uint64_t n>
uint64_t __attribute__((noinline))
foo() {
return r * n;
}
// ... function pointers with pointers to versions of foo below
The first option (in #define DO_DIRECT
in the godbolt code) calls the templated function by indexing into an array of function pointers to class member function defined as
/* all of this inside of struct bar */
typedef uint64_t (bar::*foo_wrapper_direct)();
const foo_wrapper_direct call_foo_direct[NUM_FUNCS] = {
&bar::foo<0>,
// a bunch more function pointers to templated foo...
};
// to call templated foo for non compile time input
uint64_t __attribute__((noinline)) foo_direct(uint64_t v) {
return (this->*call_foo_direct[v])();
}
The assembly for this, however, appears to have a TON of fluff:
bar<9ul>::foo_direct(unsigned long):
salq $4, %rsi
movq 264(%rsi,%rdi), %r8
movq 256(%rsi,%rdi), %rax
addq %rdi, %r8
testb $1, %al
je .L96
movq (%r8), %rdx
movq -1(%rdx,%rax), %rax
.L96:
movq %r8, %rdi
jmp *%rax
Which I am having trouble understanding.
In contrast the #define DO_INDIRECT
method defined as:
// forward declare bar and call_foo_wrapper
template<uint64_t r>
struct bar;
template<uint64_t r, uint64_t n>
uint64_t call_foo_wrapper(bar<r> * b);
/* inside of struct bar */
typedef uint64_t (*foo_wrapper_indirect)(bar<r> *);
const foo_wrapper_indirect call_foo_indirect[NUM_FUNCS] = {
&call_foo_wrapper<r, 0>
// a lot more templated versions of foo ...
};
uint64_t __attribute__((noinline)) foo_indirect(uint64_t v) {
return call_foo_indirect[v](this);
}
/* no longer inside struct bar */
template<uint64_t r, uint64_t n>
uint64_t
call_foo_wrapper(bar<r> * b) {
return b->template foo<n>();
}
has some very simple assembly:
bar<9ul>::foo_indirect(unsigned long):
jmp *(%rdi,%rsi,8)
I am trying to understand why the DO_DIRECT
method using function pointers directly to the class member function has so much fluff, and how, if possible, I can change it so remove the fluff.
Note: I have the __attribute__((noinline))
just to make it easier to examine the assembly.
Thank you.
p.s if there is a better way of converting runtime parameters to template parameters I would appreciate a link the an example / manpage.