0


Inspired from my most favorite stackoverflow question I'm trying to decide what is more 'convenient' when facing up a classical problem in programming, i.e. executing some code instead some other according to any current scenario.

To make it brief, imagine an application can be in 2 different states and has to execute some code according to this state.
(switching to C++ context, which I'm more interested to)
The usual approach would be to store status in a bool and use an if to decide which branch to execute.

A different approach is to wrap the 2 branches of code in 2 different functions and store the status in a function pointer: whenever suitable code shall be executed the application has just to call pointed function, no if involved.

Now, in addition to lazily asking your opinions on which of two strategies is more 'convenient', I approximated 'convenient'=='fastest' so I set up a benchmark whose code you can read here, and executed it also on my machine compiling with MSVC2017.

Each compiler returns different results (in my MSVC function pointer strategy is slower, in online GCC is faster) but what surprises me is function pointer strategy is always slower when switching status while I was expecting times to be very similar to non switching situation: why?

Thank you.

NOTE: increase value std::uint64_t N = 100'000'000; for significant results as online compiler would timeout.

user1832484
  • 371
  • 3
  • 8
  • Branch prediction is better supported than jump prediction. – Jarod42 Apr 18 '20 at 12:53
  • I don't understand, in switching pointer there's no prediction at all. – user1832484 Apr 18 '20 at 12:54
  • The compiler explorer version doesn't have any optimizations enabled. I hope you don't benchmark unoptimized code. – Timo Apr 18 '20 at 12:58
  • 1
    I believe the fftw library (which has "Fastest" in its name) defines a whole lot of individual similar functions so that branches can be done at the outer level, not repeated in CPU-heavy loops, and many expressions can be constant-folded rather than computed, etc. – aschepler Apr 18 '20 at 13:05
  • @Jarod42 isn't jump prediction a subset of branch prediction? I mean one branch has to jump, otherwise you couldn't branch. – Timo Apr 18 '20 at 13:08
  • [Benchmark results](http://quick-bench.com/b5OBsHIEchO6gscD8ZPrNED23go). – Jarod42 Apr 18 '20 at 13:12
  • 2
    From [Branch predictor](https://en.wikipedia.org/wiki/Branch_predictor) > Branch prediction is not the same as branch target prediction. – Jarod42 Apr 18 '20 at 13:17
  • @Jarod42: you're talking about *indirect* branch prediction specifically. Note that even *direct* unconditional branches (like a C `goto`) need prediction because the CPU needs to know what to fetch next, but it doesn't know about branches until the last fetch block makes its way down the pipeline to the decoders. So the fetch stage needs a prediction for the next block to fetch, based on the current fetch block address. See [Slow jmp-instruction](https://stackoverflow.com/q/38811901) where a big enough block of `jmp next_instruction` slows down from BTB exhaustion. – Peter Cordes Apr 18 '20 at 18:10
  • Later pipeline stages need to predict *where* in a block a taken branch might be, so the right instruction bytes can be fed to the decoders if they predict correctly. But anyway yes, using a function pointer instead of an `if` is not going to help, still a control dependency. If you're lucky the compiler will inline the functions through the indirect calls and make the same code as an `if`, otherwise it'll be even worse (because of function-call overhead; indirect branch prediction on modern x86 is probably as good as conditional branch). – Peter Cordes Apr 18 '20 at 18:12
  • @PeterCordes: I honestly only "know" "high level" information and not the details, just enough to give hints about the search. – Jarod42 Apr 18 '20 at 18:17
  • @Jarod42: That's totally fine, it was just a terminology issue in what you said. It's *indirect* branch prediction that's relevant for function pointers (unless the compiler knows what the possible values are and compiles it into a conditional direct branch like an `if`). "Jump" is a synonym for "branch" in this context. But yes, many (older or low-power) CPUs like Atom/Silvermont, or ARM, have weaker indirect branch prediction than conditional branches; many can't predict patterns of indirect branches like alternating between two targets. See also https://danluu.com/branch-prediction/ – Peter Cordes Apr 18 '20 at 18:23
  • @PeterCordes sorry I'm not so into CPU and pipelines so I still can't see where the prediction might be when using a function pointer. I mean, there's a bunch of instructions to be execute, one of those is a 'straight' jump, not a conditional jump: what am I losing? That's why I thought that performance for function pointer strategy should be similar both when jump is 'fixed' and when it 'changes'. – user1832484 Apr 18 '20 at 19:02
  • @Timo yes you are right, linked code does not have any optimization. But I tried on my PC using a release build of the MSVC compiler. Also Jarod42 provided a link for benchmarks (thank you @Jarod42) with O3 optimization and even there changing function pointer is way slower than fixed pointer and I can't understand why. – user1832484 Apr 18 '20 at 19:07
  • With an indirect branch, the new program-counter value comes from a register or memory. It's still a control dependency that needs to be predicted ahead of when that pointer value is actually available. Otherwise the front-end would have to stall until that back-end value was ready. Your simple example that alternates probably gets unrolled and optimized away instead of actually doing an indirect call, because the pattern is compile-time visible. – Peter Cordes Apr 18 '20 at 19:14
  • Oh, your Godbolt link has optimization disabled (the default is `-O0`). Other bottlenekcs make that mostly meaningless. At -O3 the loop optimizes away for the if / noswitch case and you get time=0. The if/switch case inlines the function branchlessly, just doing `b ^= 1` and `sum += b;`. (Right click on the source and do "reveal linked code" https://gcc.godbolt.org/z/Lzw32m) The indirect versions do actually update globals and indirect `call`, making them much slower mostly because of call overhead; The Skylake-X CPUs that AWS runs on should be predicting that pattern pretty well. – Peter Cordes Apr 18 '20 at 19:23
  • re: CPU details: in https://danluu.com/branch-prediction/, everything he says in the intro with diagrams about branches applies to indirect branches. *However, there are instructions called “branches” that let you change the address next instruction comes from.* Read that intro with indirect branches in mind, specifically an indirect branch that jumps to one of two possible addresses. That's basically an equivalent problem, except that it's harder because neither possible address is a fall-through to the next instruction. (Although a CPU may guess next-instruction if it has nothing better.) – Peter Cordes Apr 18 '20 at 19:28

0 Answers0