53

I am reading this book by Fedor Pikus and he has some very very interesting examples which for me were a surprise.
Particularly this benchmark caught me, where the only difference is that in one of them we use || in if and in another we use |.

void BM_misspredict(benchmark::State& state)
{

    std::srand(1);
    const unsigned int N = 10000;;
    std::vector<unsigned long> v1(N), v2(N);
    std::vector<int> c1(N), c2(N);

    for (int i = 0; i < N; ++i) 
    {
        v1[i] = rand();
        v2[i] = rand();
        c1[i] = rand() & 0x1;
        c2[i] = !c1[i];
    }

    unsigned long* p1 = v1.data();
    unsigned long* p2 = v2.data();
    int* b1 = c1.data();
    int* b2 = c2.data();

    for (auto _ : state)
    {
        unsigned long a1 = 0, a2 = 0;
        for (size_t i = 0; i < N; ++i) 
        {
            if (b1[i] || b2[i])  // Only difference
            {
                a1 += p1[i];
            }
            else 
            {
                a2 *= p2[i];
            }
        }
        benchmark::DoNotOptimize(a1);
        benchmark::DoNotOptimize(a2);
        benchmark::ClobberMemory();

    }
    state.SetItemsProcessed(state.iterations());
}

void BM_predict(benchmark::State& state)
{

    std::srand(1);
    const unsigned int N = 10000;;
    std::vector<unsigned long> v1(N), v2(N);
    std::vector<int> c1(N), c2(N);

    for (int i = 0; i < N; ++i)
    {
        v1[i] = rand();
        v2[i] = rand();
        c1[i] = rand() & 0x1;
        c2[i] = !c1[i];
    }

    unsigned long* p1 = v1.data();
    unsigned long* p2 = v2.data();
    int* b1 = c1.data();
    int* b2 = c2.data();

    for (auto _ : state)
    {
        unsigned long a1 = 0, a2 = 0;
        for (size_t i = 0; i < N; ++i)
        {
            if (b1[i] | b2[i]) // Only difference
            {
                a1 += p1[i];
            }
            else
            {
                a2 *= p2[i];
            }
        }
        benchmark::DoNotOptimize(a1);
        benchmark::DoNotOptimize(a2);
        benchmark::ClobberMemory();

    }
    state.SetItemsProcessed(state.iterations());
}

I will not go in all the details explained in the book why the latter is faster, but the idea is that hardware branch predictor is given 2 chances to misspredict in slower version and in the | (bitwise or) version. See the benchmark results below.

enter image description here

So the question is why don't we always use | instead of || in branches?

Eduard Rostomyan
  • 7,050
  • 2
  • 37
  • 76
  • 11
    *So the question is why don't we always use | instead of || in branches?* -- Readability and maintenance. Stick an `|` instead of an `||` in a place where `||` is expected, and it simply looks like a bug. You now have to spend time convincing your colleagues (and even yourself), that the "trick" of using `|` works. – PaulMcKenzie Feb 08 '22 at 19:46
  • The misspredict has more differences. `if (a || b) c; else d;` is close to `if (a) then c; else if (b) then c; else d;`. I saw such a question on SO earlier. – 273K Feb 08 '22 at 19:48
  • 25
    C++ observes [the As-If Rule](https://stackoverflow.com/questions/15718262/what-exactly-is-the-as-if-rule). The TL;DR version is if the compiler can prove that it will not change the observable behaviour, it can perform whatever transformations to your code it wants. This leads to a more important realization: Code is a description of behaviour, not a list of instructions. The compiler will take your code, interpret the described behaviour, and generate an optimal (within the bounds of how much optimization you asked for and the limits of the compiler) list of instructions. – user4581301 Feb 08 '22 at 20:04
  • @user4581301 That's what I used to learn as well. The compiler will do the optimizations on hotspots and stuff it thinks can be optimized. Glad that you mentioned it. – F. Müller Feb 08 '22 at 20:08
  • 78
    Compare `1 | DropTheBomb()` to `true || DropTheBomb()`... –  Feb 08 '22 at 20:11
  • 11
    @PaulMcKenzie: Not just "looks like a bug"; if the right hand side dereferences a pointer it shouldn't have, it *is* a bug. That's why the compiler can't just optimize `b1[i] || b2[i]` into an `or` instruction in this case. Although `int* b2 = c2.data()` has already been dereferenced inside this function, during the init part, but it's a lot of work to prove that. (And in other cases, might cache miss.) – Peter Cordes Feb 08 '22 at 21:19
  • 1
    Related: https://stackoverflow.com/questions/40991778/an-expensive-jump-with-gcc-5-4-0 (disclaimer: I'm pointing to you to my own answer) – Cody Gray - on strike Feb 09 '22 at 04:49
  • Probably you know that, but while | and || are (apart from short circuit evaluation) equivalent, & and && are not (you would probably need to add some !!) – lalala Feb 09 '22 at 08:16
  • 1
    @lalala look up sequence points in C++ – autistic Feb 09 '22 at 09:07
  • 1
    _"but the idea is that hardware branch predictor is given 2 chances to misspredict in slower version"_ - That's a little disingenuous; the way `c1` and `c2` are populated, the `if` will always evaluate to `true`, which any branch predictor will quickly learn. That means basically 0 mispredictions for the `|` case, but (up to) one per loop for the `||` case. The entire benchmark looks like it's designed to evoke the biggest difference possible between the two cases. Which is fair game, but an important piece of context. – marcelm Feb 09 '22 at 09:51
  • 3
    @autistic it is not about sequencing. `7 && 8` evals to true in boolean context, `7 & 8` evals to false. – Hristo Iliev Feb 09 '22 at 11:51
  • 10
    @HristoIliev: the comment about sequence points refers to the claim that `|` and `||` are equivalent (apart from short-circuiting); that's not true because (for instance) `printA_then_returnZero() || printB_then_returnZero()` must always print AB, but `printA_then_returnZero() | printB_then_returnZero()` can print either AB or BA. – psmears Feb 09 '22 at 13:58
  • @Hristo Iliev your example is irrelevant from my discussion of "equivalent", and way too trivial to be representative of the question (there may be no overhead in execution for an expression that can hoisted into compilation, please go read your C++ standard draft for about two years then come back and we have this discussion). The reason I suggested lalala look up sequence points has more to do with the choice of words, "equivalent"... I don't disagree with you, but what you write doesn't seem relevant here, for these reasons I have indicated. A book... my kingdom for a book... – autistic Feb 09 '22 at 20:35
  • It is worth noting that this is the most extreme example. Because `b1[i] || b2[i]` results in on average 1.5 unpredictable branches (because both values are random) while `b1[i] | b2[i]` results in 1 predictable branch (because it's always true, see how the data is prepared). In most simple cases | is favorable over || **if** it gives a more predictable result, but it is pretty much impossible to guess without actually profiling the code with **real** data, so default to ||. – Sopel Feb 11 '22 at 12:26
  • @Sopel but unless either of those have no side-effects (as it appears), you too are missing the point. When you use the `|` operator, either LHS or RHS can be evaluated first. When there are side-effects involved, that means the order of the side-effects may change depending upon the compiler/implementation you use. Look, I can't make you read the documents I've read, I can only tell based on what you're writing that you haven't read the documents I've read. – autistic Feb 20 '22 at 20:49
  • @autistic Yea, well, of course we assume in this case that they don't have side effects because otherwise they would not be identical and measuring performance of code that differs in behaviour is pointless. I was commenting on the post, not on your comment. – Sopel Feb 21 '22 at 13:01
  • @Sopel expressions are often calculated at compile time, when they can be, which complicates the question ... which is faster, the one with calculations hoisted into compile time or the other with calculations hoisted ...? We can't really say from such a small fragment of code... – autistic Feb 22 '22 at 16:03
  • @autistic I don't know what statement of mine you're referring to. – Sopel Feb 22 '22 at 16:12
  • @Sopel C and C++ have these "as if" rules, whereby the compiler can heavily optimise your code only so long as the observable behaviour of the app is the same. So therefore, in places where `||` and `|` yield the same behaviour, your C compiler might produce the same machine code. This is old, but it gives an idea of how smart optimising compilers were ten years ago: https://ridiculousfish.com/blog/posts/will-it-optimize.html – autistic Feb 22 '22 at 20:32
  • @autistic Can you quote a statement of mine that you think is wrong? Because I have no idea why you're responding to me with this. I see no connection. – Sopel Feb 23 '22 at 10:31
  • @Sopel you seem to be pulling this `b1[i]` thing from somewhere... If introducing array subscript operations into the discussion is ok, I assume so too is any other operation, such as the increment operators and function calls. You assume side-effects aren't relevant because they change behaviour, well if it weren't for the side-effects that change the value of `b1[i]` (ditto for `b2`) then the compiler could perform dead code elimination to optimise the array storage, and probably the rest of the program away. Talking about optimisation without side-effects as context is pointless. – autistic Feb 24 '22 at 08:00
  • @autistic 1. `b1[i]` is from the code in the OP. 2. "You assume side-effects aren't relevant because they change behaviour, well if it weren't for the side-effects that change the value of b1[i] (ditto for b2) then the compiler could perform dead code elimination to optimise the array storage" how is this relevant to LOADS. 2.1. `b1[i]` does not have side-effects – Sopel Feb 24 '22 at 23:04

9 Answers9

93

Is if(A | B) always faster than if(A || B)?

No, if(A | B) is not always faster than if(A || B).

Consider a case where A is true and the B expression is a very expensive operation. Not doing the operation can save that expense.

So the question is why don't we always use | instead of || in branches?

Besides the cases where the logical or is more efficient, the efficiency is not the only concern. There are often operations that have pre-conditions, and there are cases where the result of the left hand operation signals whether the pre-condition is satisfied for the right hand operation. In such case, we must use the logical operator.

if (b1[i])  // maybe this exists somewhere in the program
    b2 = nullptr;

if(b1[i] || b2[i]) // OK
if(b1[i]  | b2[i]) // NOT OK; indirection through null pointer

It is this possibility that is typically the problem when the optimiser is unable to replace logical with bitwise. In the example of if(b1[i] || b2[i]), the optimiser can do such replacement only if it can prove that b2 is valid at least when b1[i] != 0. That condition might not exist in your example, but that doesn't mean that it would necessarily be easy or - sometimes even possible - for the optimiser to prove that it doesn't exist.


Furthermore, there can be a dependency on the order of the operations, for example if one operand modifies a value read by the other operation:

if(a || (a = b)) // OK
if(a  | (a = b)) // NOT OK; undefined behaviour

Also, there are types that are convertible to bool and thus are valid operands for ||, but are not valid operators for |:

if(ptr1 || ptr2) // OK
if(ptr1  | ptr2) // NOT OK; no bitwise or for pointers

TL;DR If we could always use bitwise or instead of logical operators, then there would be no need for logical operators and they probably wouldn't be in the language. But such replacement is not always a possibility, which is the reason why we use logical operators, and also the reason why optimiser sometimes cannot use the faster option.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • 10
    A slightly better example might be `if(!ptr || *ptr)` as it fits the original question well, and especially as your second line `if(ptr & *ptr)` is very hard to think of a scenario where it makes sense, where as `if(!ptr | *ptr)` makes immediate sense. – Mike Vine Feb 08 '22 at 20:37
  • 8
    `if(!ptr | *ptr)` means "enter this block if the pointer is null or what it points to is non-zero which is a perfectly reasonable condition. `if(ptr & *ptr)` means "enter this block if the bit-wise and of the pointer _value_ and what it points to is non zero (???). Which like I said is so weird to be almost non-sensical. – Mike Vine Feb 08 '22 at 21:10
  • @MikeVine oh yea. I get what you mean now. I fixed the example. – eerorika Feb 08 '22 at 21:13
  • 1
    `&` doesn't work well as an equivalent to `&&`; even for non-null pointers, `if((ptr != nullptr) & *ptr)` only checks the low bit of `*ptr` for being non-zero. OR doesn't have this problem for logical vs. bitwise. – Peter Cordes Feb 08 '22 at 21:20
  • @PeterCordes & MikeVine Fair enough. I removed that pointer example entirely. But I'll add another related to Mike's point – eerorika Feb 08 '22 at 22:00
  • 1
    Yup, absolutely; the reason clang isn't optimizing the code in the question to a bitwise `or` instruction is almost certainly that the C++ abstract machine doesn't access `b2[i]` if `b1[i]` is non-zero. That could be out-of-bounds. It should be able to assume that `int *b2` is non-NULL since it's set from `.data()` of a `std::vector` that was written to earlier in this function, but that's a lot of complicated branching and reallocating for the compiler to track stuff through. `*ptr1 || *ptr2` would totally work as a relevant example. – Peter Cordes Feb 08 '22 at 22:12
  • Fun fact: current GCC does auto-vectorize this with AVX2 unconditionally loading from both `b1` and `b2` https://godbolt.org/z/d1MfsW5be (then using `vpmaskmovq` to make the `p1[i]` loads conditional) , but clang compiles as written. (bitwise OR, or two separate cmp/jcc). The `a2 *= p2[i];` can optimize away because `a2 = 0` to start; otherwise 64-bit multiply is expensive to vectorize without AVX-512. So the only real work is a masked sum of `p1[]`. – Peter Cordes Feb 08 '22 at 22:23
  • Good answer. I feel like you could venture into the definition of C++, as a "parametric abstract machine" wherein a compiler can make optimisations only if it can prove that the observable behaviour is as if executed by the abstract machine. We may already have compilers that attempt to vigorously optimise away `srand(1)` and all of the `rand` calls, for example, as the compiler determines the output to always be the same ... and if not then [it can't be far off](https://ridiculousfish.com/blog/posts/will-it-optimize.html). Why discuss optimisation when they're both essentially static output? – autistic Feb 09 '22 at 20:47
19

If evaluating A is fast, B is slow, and when the short circuit happens (A returns true), then if (A || B) will avoid the slow path where if (A | B) will not.

If evaluating A almost always gives the same result, the processor's branch prediction may give if (A || B) performance better than if (A | B) even if B is fast.

As others have mentioned, there are cases where the short circuit is mandatory: you only want to execute B if A is known to evaluate false:

if (p == NULL || test(*p)) { ... }  // null pointer would crash on *p

if (should_skip() || try_update()) { ... }  // active use of side effects
comingstorm
  • 25,557
  • 3
  • 43
  • 67
  • And in my personal subjective experience it is a lot more common in real world scenarios that a condition is mostly predictable, instead of being randomly 50/50 in each iteration. This is why branch prediction is a thing. – Falco Feb 11 '22 at 12:50
13

Bitwise-or is a branchless arithmetic operator corresponding to a single ALU instruction. Logical-or is defined as implying shortcut evaluation, which involves a (costly) conditional branch. The effect of the two can differ when the evaluations of the operands have side effects.

In the case of two boolean variables, a smart compiler might evaluate logical-or as a bitwise-or, or using a conditional move, but who knows...

  • 7
    In the case of `int` variables, compilers can also optimize `a || b` into `a | b`. But only if they're directly available, like already in registers, not from a pointer deref. GCC and clang do this (https://godbolt.org/z/nxeKT1roP) even if it means loading from a global variable which might cache miss, but it's always safe. (If another thread was writing the global var, its value only matters if the LHS was `0`; this can introduce unsynchronized access to a variable, but that's safe in asm for real CPUs as long as you don't care about the value. But not in ISO C++ source: that's UB) – Peter Cordes Feb 08 '22 at 21:25
  • 1
    "that's safe in asm for real CPUs as long as you don't care about the value" \*cries in read-sensitive memmapped registers\* (But seriously, this is one reason why ISO C/C++ are not particularly well-suited for bare-metal development. `volatile` can help. Sometimes.) – TLW Feb 11 '22 at 02:06
10

So the question is why don't we always use | instead of || in branches?

  • Branch prediction is relevant only to fairly hot pieces of code, and it depends on the branch being predictable enough to matter. In most places, | has little or no performance benefit over ||.

Also, taking A and B as expressions of suitable type (not necessarily single variables), key relevant differences include:

  • In A || B, B is evaluated only if A evaluates to 0, but in A | B, B is always evaluated. Conditionally avoiding evaluation of B is sometimes exactly the point of using the former.

  • In A || B there is a sequence point between evaluation of A and evaluation of B, but there isn't one in A | B. Even if you don't care about short-circuiting, you may care about the sequencing, even in relatively simple examples. For example, given an integer x, x-- || x-- has well-defined behavior, but x-- | x-- has undefined behavior.

  • When used in conditional context, the intent of A || B is clear to other humans, but the reason to substitute A | B less so. Code clarity is extremely important. And after all, if the compiler can see that it is safe to do so (and in most cases it is more reliable than a human at making the determination) then it is at liberty to compile one of those expressions as if it were the other.

  • If you cannot be sure that A and B both have built-in types -- in a template, for example -- you have to account for the possibility that one or both of | and || have been overloaded. In that case, it is reasonable to suppose that || will still do something that makes sense for branch control, but it is much less safe to assume that | will do something equivalent or even suitable.

As an additional minor matter, the precedence of | is different from that of ||. This can bite you if you rely on precedence instead of parentheses for grouping, and you need to watch out for that if you are considering modifying existing code to change || expressions to | expressions. For example, A && B || C && D groups as (A && B) || (C && D), but A && B | C && D groups as (A && (B | C)) && D.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
6

Even if a and b are automatic-duration Boolean flags, that doesn't mean that an expression like a||b will be evaluated by checking the state of one flag, and then if necessary checking the state of the other. If a section of code performs:

x = y+z;
flag1 = (x==0);
... code that uses flag1

a compiler could replace that with:

x = y+z;
if (processor's Z flag was set)
{
... copy of that uses flag1, but where flag is replaced with constant 1
}
else
{
... copy of that uses flag1, but where flag is replaced with constant 0
}

Although hardly required to do so, a compiler may base some of its decisions about whether to perform such substitution upon a programmer's choice of whether ot write (flag1 || flag2) or (flag1 | flag2), and many factors may cause the aforementioned substitution to run faster or slower than the original code.

supercat
  • 77,689
  • 9
  • 166
  • 211
3

Code readability, short-circuiting and it is not guaranteed that Ord will always outperform a || operand. Computer systems are more complicated than expected, even though they are man-made.

There was a case where a for loop with a much more complicated condition ran faster on an IBM. The CPU didn't cool and thus instructions were executed faster, that was a possible reason. What I am trying to say, focus on other areas to improve code than fighting small-cases which will differ depending on the CPU and the boolean evaluation (compiler optimizations).

Tony Tannous
  • 14,154
  • 10
  • 50
  • 86
2

The expression A | B might be faster in a loop that the compiler can optimize to a bitwise or of two vectors. Another case where | might be slightly more optimized is when the compiler would want to optimize away a branch by combining the two operands with bitmasks. If the operand on the right is something that might have visible side-effects, the compiler must instead insert a branch to guarantee the correct short-circuiting.

In other situations I can think of, A || B will be as fast or faster, including just about all the ones I can think of where you’re comparing a single pair of operands rather than a vector. However, those are almost never critical to performance.

Davislor
  • 14,674
  • 2
  • 34
  • 49
  • *it cannot do this with a || expression* This isn't true in general. If the compiler can prove the behaviour is unchanged, it is free to perform the optimisation. The relevant *as-if rule* has been mentioned in a comment. – Max Barraclough Feb 10 '22 at 20:33
  • @MaxBarraclough Yes, agreed: there are times when the static compiler can statically tell that the RHS of a `||` expression has no side-effects, and times when it cannot. I’ll re-word that sentence if it was confusing. – Davislor Feb 10 '22 at 21:56
1

Adding to the list:

Given the case that A and B are totally unpredictable, but usually A||B is true (i.e. when A is wrong, then usually B is true and vice versa). In this case A||B may lead to a lot of mispredictions, but A|B is predictable and most likely faster.

tommsch
  • 582
  • 4
  • 19
  • 1
    That's the case in the question. The question is asking whether that's *always* the case, and if not why don't/can't we just compile `A||B` the same as `A|B`. – Peter Cordes Feb 09 '22 at 13:58
0

So the question is why don't we always use | instead of || in branches?

In my little mind three reasons, but that might be only me.

First and most: all code tends to be read multiple times more often than it is written. So, in the example you have an array of ints with value either zero or 1. What the code really is meant to do is hidden to the reader. It might be clear to the original writer exactly what is supposed to be done, but years later, and after adding lots and lots of code lines it is probably anyones guess. In my world use what shows the intended comparison, it either is a bit comparison or a logical one. It should not be both.

Secondly: does the performance gain really matter? The example basically does nothing and could be coded a lot more efficient. Not? Well don't create the array in the first place, what the code does now is simply to check the quality of the rand function. The optimization is an example where a better analysis of the problem probably would give a much larger gain.

Third: a different compiler or a different processor will probably change the relative speeds. Now, what you are doing here is applying your knowledge of the internals of the current compiler and current processor. Next version of compiler and processor might turn the question around totally. You really cannot know. And we all know that the only way to know if an optimization actually will be quicker is by testing, which this very specific example has done.

So, if, for some reason, getting the very last bit of efficiency out of the code is worth it I would mark the choice as a "hack". I would probably include a macro with something like "DO_PERFORMANCE_HACKS" and have two versions of the code, one with | and one with || and commenting what the intention is. By changing the macro it will be possible for next readers to see what and where the hack is, and in the future may elect to not compile them.

ghellquist
  • 195
  • 6