CPU/compiler branch prediction

Question

In time-critical processes I try avoid introducing branching, but sometimes there is no practical way around it.

In those cases, is there a performance difference between different ways to test for a certain condition? Is it recommended to test for the most likely outcome, or would it make no difference? I.e. is there a different 'weight' between TRUE and FALSE conditions in CPU or compiler architectures?

E.g. will there typically be a difference between the following three snippets:

// snippet 1: test for exception first
if ( !likely_to_happen ) {
  foo();
} else {
  bar();
}

// snippet 2: test for most common cases first
if ( likely_to_happen ) {
  bar();
} else {
  foo();
}

// snippet 3: test for exception in an alternative way
if ( not_likely_to_happen) {
  foo();
} else {
  bar();
}

If possible, you should explicitly tell the compiler. See http://stackoverflow.com/questions/30130930/ — Mats, Sep 21 '16 at 08:10
Thanks. I see "__builtin_expect" seems to be the commonly recommended solution. It somehow feels kludgy or at least inelegant. I was hoping for a general code pattern (such as ordering the code-flow or prioritizing true-conditions, etc.) that doesn't involve compiler-specific macros. — Bram Bos, Sep 21 '16 at 08:45
You will always have different ways to order your code to help with branch prediction. However, they will largely be code specific and it is hard to generalize, e.g. do X to always reduce branching. That said, you can always consider whether sorting data can help, etc.., You can search on something similar to *C methods to reduce branching* and the like to see what other generalities can be found, — David C. Rankin, Sep 21 '16 at 09:02
There is no standard way to communicate hot vs. cold code information to the compiler. Compilers don't generally try to use the layout of the C to decide how to lay out the asm at all. Instead, they have heuristics for guessing branch likelihoods. Those heuristics can't usually do very well, but you can fix all that by using profile-guided optimization. e.g. `gcc -fprofile-generate -O3`, run your program on some common inputs, recompile with `gcc -fprofile-use -O3`. — Peter Cordes, Sep 21 '16 at 09:09

CPU/compiler branch prediction

0 Answers0