0

In the following example, I assume that functions f1-f4 are slow, but short and inlined. It is clear to me on iteration i=j that the taken branch of iteration i=j+1 is dependent on the value of data[j+1] so I can predict it in advance during the computation of iteration i=j.

How can I help the x86 branch predictor to see this? Or maybe it already sees it without any changes from my side? If yes, how does it work?

int foo(int* data, int n) {
    int x = 0;
    for (int i = 0; i < n; i++) {
        switch (data[i]) {
            case 0: x = f1(x); break;
            case 1: x = f2(x); break;
            case 2: x = f3(x); break;
            default: x = f4(x); break;
        }
    }
    return x;
}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Curious
  • 507
  • 3
  • 16
  • You don't iterate with j, switch (data[i]) might be more readable. Performance questions without whole code and enough input data (on said platforms it means n should be millions for branch prediction to matter) are relatively pointless. – Öö Tiib Apr 20 '22 at 09:10
  • Out-of-order exec should already be doing its job for [Avoid stalling pipeline by calculating conditional early](https://stackoverflow.com/q/49932119) - the `data[i]` in the next loop iteration is a separate dep chain from the slow stuff. Just letting out-of-order exec see the branch instruction with its inputs ready is all you can do. Separately, x86 doesn't have SW prefetch for code either, only data, but if your functions are small and inlined they'll be hot in I-cache. So [X86 prefetching optimizations: "computed goto" threaded code](//stackoverflow.com/q/46321531) doesn't really apply. – Peter Cordes Apr 20 '22 at 09:33
  • But that Q&A does discuss relevant stuff including branch prediction. My searching also found some Q&As that *don't* really apply: [How can I prefetch infrequently used code?](https://stackoverflow.com/q/16218757) (just about priming I-cache, or really at best L2 cache) / and [Bring code into the L1 instruction cache without executing it](https://stackoverflow.com/q/48571263) for (that's about creating conditions for a microbenchmark, unhelpful for overall performance.) – Peter Cordes Apr 20 '22 at 09:38
  • (This seems like a duplicate to me, but maybe only because I already understand the answers and why this is basically the same question. If people think my earlier comments should be posted as an actual answer to this, I can reopen and do that.) – Peter Cordes Apr 20 '22 at 09:39

0 Answers0