Logical operators and branch prediction failure

Question

Consider the following loops:

while((expressionA) & (expressionB)){
    // do something
}

while((expressionA) && (expressionB)){
    // do something
}

where expressionA and expressionB are expressions of type bool and expressionB has no side-effects. Under these conditions, the two cases are as-if-equivalent (right?).

A (hypothetical) compiler that naively takes its cue from the source code would put a branch in the && version and we would end up paying for branch prediction failures.

With a modern compiler (such as current GCC), can there ever be any conditions under which the & version gives a substantial performance gain over of the && version?

My guess is no, because:

If expressionB is sufficiently cheap, the compiler will recognize this and avoid creating the short-circuiting branch.
If expressionB is sufficiently expensive, the compiler will create the short-circuit because:
- if probability of expressionA is not close to 1.0, we get a substantial average performance gain from short-circuiting.
- if probability of expressionA is close to 1.0, we won't pay much because branch prediction will tend to succeed.

AFAIK there's no way of telling the optimizer which cases are the most likely, which makes it difficult for cases like `if ( a && b && c && d && e )` where there are no side-effects but computationally expensive. Theoretically the C standard would allow the tests to be done in any order so even if you put the most expensive first, the optimizer might not honour that. — M.M, Dec 03 '15 at 08:38
What I'm finding just playing around in GCC 5.2 on some simplistic tests is that `operator&` can actually cause a slight difference in register allocation, but identical to `operator&&` in terms of branching (at least with operands that evaluate to `bool`). One thing, as M.M. alluded to, is that a compiler can't truly figure out what is "cheap" here, since the probability/predictability of a coin flip expression is often going to be based around inputs only provided at runtime. It goes beyond the basic evaluation. What I'm seeing is short-circuiting being applied for *both* operators here. — , Dec 03 '15 at 08:43
@M.M Could you please explain why you brought up the side-effects of `expressionA`? — Museful, Dec 03 '15 at 10:21
@Ike Thanks. By "cheap expression" I just mean one that can be evaluated quickly, so in that sense the compiler *can* figure out what is cheap. Probability matters of course, but as a separate factor. Take the case where `expressionB` is just a `bool` variable whose value is unknown at compile-time but is already evaluated (at this point in run-time) because its value was already needed elsewhere in the code. That value is available "cheaply" and making the short-circuit in that case (for any of the two operators) would just seem wasteful, especially because branch prediction may fail. — Museful, Dec 03 '15 at 10:28
gcc usually makes a separate branch instruction for each piece of a short-circuit-evaluated conditional. If (A&B) predicts well, but neither A nor B alone predict well, I think you could see a benefit to using `A&B` over `A&&B`. Turning a comparison into a value that can actually be bitwise-ANDed with something else usually isn't cheap, though. It's a lot easier in the `||` vs. `|` case, because usually you can just compute things an `OR` them together. e.g. `2 | 1` is true, but `2 & 1` is false. — Peter Cordes, Dec 03 '15 at 13:03
@tennenrishin Understood what you meant there by cheap/expensive as relating to the cost of evaluation. What I was seeing fooling around in GCC was that, in cases where both operators produce logically equivalent results, the optimizer was still short-circuiting them (yielding two branch instructions instead of an `and` instruction + branch). My guess for why it does that is that it still can't assume anything on the part of the predictability of an expression to be able to easily deduce that an `and` instruction would not degrade branch predictability. — , Dec 03 '15 at 17:40

Logical operators and branch prediction failure

0 Answers0