Could arrays of function pointers be used to remove branches

Question

Would arrays of function pointers be able to increase performance by removing branches?

Is

(function[number])();

faster than

if(number == 0)
    function1();
else if (number == 1)
    function2();

for performance? (assuming that this was extremely performance critical code)

EDIT: number is 0 or 1 (else used because it should only be 0 or 1)

Possible duplicate of [Lookup table vs switch in C embedded software](https://stackoverflow.com/questions/35838849/lookup-table-vs-switch-in-c-embedded-software/35846099) — meowgoesthedog, Sep 07 '18 at 10:40
You have to worry about both if it really is "performance-critical" code. — meowgoesthedog, Sep 07 '18 at 10:42
The compiler optimisation would remove the thousands of times it was tested. — me', Sep 07 '18 at 10:42
Yes but the question was about branches. Let's say the functions were too large to inline. — me', Sep 07 '18 at 10:43
There is a half chance of number being 0 and half of being non zero(probably 1). — me', Sep 07 '18 at 10:44
If I had to guess, the normal way (not using an array) should be at least as fast as the other one. To have definite answer, you'll have to benchmark it on your own. — paler123, Sep 07 '18 at 10:45
Hum. So branch prediction will not help. My money is on the `if` being quicker. But to be sure, benchmark, and look at the assembler your compiler produces. — Bathsheba, Sep 07 '18 at 10:46
Actually it depends on your compiler optimizations, your hardware (processor cache, branch prediction), number of branches and maybe something else. It is difficult to say what variant is faster without testing. — Sandro, Sep 07 '18 at 10:47
"The compiler optimisation would remove the thousands of times it was tested." if you believe one cannot measure the difference, then why do you care whether there is a difference? :P The only way to know for sure really is to implement both and measure — 463035818_is_not_an_ai, Sep 07 '18 at 11:18

MSalters · Accepted Answer · 2018-09-07T11:19:01.847

8

Unlikely to help. Function pointers cause the same sort of stalls* as conditions. Nowadays, both have predictors, but the conditional predictors have better success rates. Not a surprise, as they only need to predict one bit.

* Stall: a delay in the processing of instructions, caused by the pipeline nature of the instruction decoder in modern CPU's. Instruction decoding starts several cycles before the actual execution, but is somewhat speculative in nature as to which instruction will be executing in the future. When this speculation is wrong, a number of decoded instructions have to be discarded, and instruction execution is delayed until the decoder catches up.

edited Sep 07 '18 at 11:19

answered Sep 07 '18 at 10:53

MSalters

173,980
10
155
350

Replacing a *single* conditional branch with one computed jump is unlikely to help. Replacing *multiple* conditional branches with *one* computed jump is likely to help if the average number of mis-predicted branches is sufficiently high (on many platforms, it's unlikely to help much if the average number of mis-predictions is less than 1, and unlikely to hurt much if it's greater than 1). – supercat Sep 07 '18 at 19:02

score 1 · Answer 2 · 2018-09-07T11:49:20.790

As MSalters already stated very toroughly, it is unlikely to help. Furthermore it is almost imposible to predict if it helps. I have created two examples for you where you can see that -- one using a pointer array and a second one if-else-clause.

Example 1

#include <array>
#include <functional>



int test1(int* i)
{
    return *i * *i  * *i;
}

int test2(int* i)
{
    return *i * *i* *i* *i* *i* *i;
}

std::array<int (*)(int*), 2> funcs{test1, test2};

int testing(int v)
{
    int testme = v + 4 ;
    return funcs[v](&testme);
}

with its assembly

test1(int*):                             # @test1(int*)
    movl    (%rdi), %ecx
    movl    %ecx, %eax
    imull   %ecx, %eax
    imull   %ecx, %eax
    retq
test2(int*):                             # @test2(int*)
    movl    (%rdi), %ecx
    movl    %ecx, %eax
    imull   %ecx, %eax
    imull   %ecx, %eax
    imull   %eax, %eax
    retq
testing(int):                            # @testing(int)
    pushq   %rax
    leal    4(%rdi), %eax
    movl    %eax, 4(%rsp)
    movslq  %edi, %rax
    leaq    4(%rsp), %rdi
    callq   *funcs(,%rax,8)
    popq    %rcx
    retq
funcs:
    .quad   test1(int*)
    .quad   test2(int*)

Here is Example2

int test1(int* i)
{
    return *i * *i  * *i;
}

int test2(int* i)
{
    return *i * *i* *i* *i* *i* *i;
}


int testing(int v)
{
    int testme = v + 4 ;
    return (v == 0) ? test1(&testme) : test2(&testme);
}

with its corresponding assembly

test1(int*):                             # @test1(int*)
    movl    (%rdi), %ecx
    movl    %ecx, %eax
    imull   %ecx, %eax
    imull   %ecx, %eax
    retq
test2(int*):                             # @test2(int*)
    movl    (%rdi), %ecx
    movl    %ecx, %eax
    imull   %ecx, %eax
    imull   %ecx, %eax
    imull   %eax, %eax
    retq
testing(int):                            # @testing(int)
    leal    4(%rdi), %eax
    movl    %eax, %ecx
    imull   %eax, %ecx
    imull   %eax, %ecx
    testl   %edi, %edi
    movl    $1, %eax
    cmovnel %ecx, %eax
    imull   %ecx, %eax
    retq

Instruction-count wise they are very similar (maybe an expert can help us out here and explain it further). The if-else-statement breaks doun to this one instruction cmovnel. I doubt it is going to make much difference.

I think you will always have to measure in your exact setup. Compiler optimization is always difficult to predict in example cases.

There will NOT be a true or false answer possible here.

@me' Doh. To stupid to copy-past. This also changes the rest of the answer! Thanks for pointing this f***up out. — , Sep 07 '18 at 11:50
In second case, it completely inlines test1/test2 (and `test2` is `test1 * test1`)... — Jarod42, Sep 07 '18 at 16:49
Yeah, these damn compilers are always outsmarting me when making up examples. This is the main case why **always benchmark your real code** is a good advice ;) — , Sep 07 '18 at 18:58

score 0 · Answer 3 · answered Sep 07 '18 at 10:51

Assuming your number is of type unsigned int your array would have to contain UINT_MAX entries with all being the same except function[0]. If that is not the case your first code (function[number])(); omits a condition to not be undefined behaviour: (function[number != 0])();

To sum it up: #1 is (correctly written) a condition and two indirections. The other option of a giant array will destroy cache locality. #2 is one condition and a direct function call.

Could arrays of function pointers be used to remove branches

3 Answers3