Where the likely/unlikely statement should be placed for more performance?

Question

Some software (often performance-oriented, e.g. Linux kernel, DPDK) has C helpers for influencing branch prediction.

I have an absolutely simple code snippet (suppose I know the percantage of a > b) to represent the problem of conditions nesting and applying likely/unlikely when some logic is nested:

bool foo()
{
    foo1(1);
    foo2(2);

    /* if (unlikely(a > b)) */
    /* if (a > b)*/
    {
        puts("Ohhh!!! Rare case");
        return true;
    }
    return false;
}

int main(void)
{
    /* if (unlikely(foo())) */
    /* if (foo()) */
    {
        puts("Azaza");
    }
}

So which 2 lines should be uncomented for more performance from a theoretical point of view?

Obviously there are 3 ways to assist compilier with branch prediction:

1. if (unlikely(a > b)) ... if (unlikely(foo()))

2. if (a > b) ... if (unlikely(foo()))

3. if (unlikely(a > b)) ... if (foo())

Which is theoretically the most efficient and why?

@TedLyngmo Tsyvarev is absolutely right! The question is about how to do it in case of nesting. Updated question — NK-cell, Jul 06 '23 at 12:51
My gut feeling says that you should put it as close to the place where you have knowledge about the expected outcome. That is, inside `foo()` here. Then again, it would probably not hurt to add it to the `if` in `main` too. — Ted Lyngmo, Jul 06 '23 at 13:04
Updated question. Seems that close votes can be revoked.. ^_^ — NK-cell, Jul 06 '23 at 13:05
In this case, `foo` should be written as simply `return a > b;`, without any branches. If you have more code than just `return` in the `if/else` then it's fine, but in that case of course the `likely` should be in `foo`. — interjay, Jul 06 '23 at 13:11
@interjay *"but in that case of course the likely should be in foo"* - why should not it be in `main()` too? Btw - a bit edited the code :) — NK-cell, Jul 06 '23 at 14:28
It can be in `main` too. The compiler might be smart enough to not need it if it inlines `foo`, but it won't hurt. — interjay, Jul 06 '23 at 14:44
@interjay It is more logical to assume that first of all it should be in `main()`, IMHO it's better to cut off the wrong branch of execution *earlier*. — NK-cell, Jul 06 '23 at 15:35
I don't think dpdk and linux kernel should be tagged here. Why not performance and/or optimisation? — user997112, Jul 07 '23 at 05:06
`It is more logical to assume that first of all it should be in main()` - likely/unlikely is for predicting the outcome of a single branch. You have two branches here. If you add likely/unlikely to one, the other still remains unoptimized. In theory compilers might be able to make an inference of the connection or make the function inline, but unless you know that it does happen, I wouldn't rely on it. — skandigraun, Jul 07 '23 at 06:28
The top answer on the Q&A you linked is mostly obsolete. See comments on it like mine and BeeOnRope's: [How do the likely/unlikely macros in the Linux kernel work and what is their benefit?](https://stackoverflow.com/posts/comments/70654751) , and Ciro's answer: likely/unlikely influences branch *layout*, and the decision to make branchy vs. branchless asm if that's an option. But nothing can hint the actual CPU hardware branch prediction about which is more likely. — Peter Cordes, Jul 27 '23 at 17:11

Benedict Schlüter · Answer 1 · 2023-07-27T11:44:23.990

0

To my knowledge, the likely/unlikely statements show the best effect if the conditional variable is not in the cache. Let me explain in more detail.

In your code, the processor has to execute foo anyway. So the likely won't have any strong effect here, since no code path can be skipped during speculative execution. The function must be executed. Let's assume you save the result of foo before in a variable and the code looks like this:

int x = foo();
if (likely(x))
{
    puts("Azaza");
}

In this case the likely(x) will probably only influence the prefetcher/decoder, since the processor just computed x and the value is most likely cached in L1 (except it got interrupted at that point). However, for a precise answer one has to know the microarchitecture extremely well, since it might be possible that current CPUs are so advanced that they can fetch and decode both branches at once.

Now let's assume you have a global variable volatile int c = 15 and we change your code:

if (likely(b == 15))
{
    puts("Azaza");
} else {
    puts("Bzaza");
}

When we execute the code and b is accessed for the first time it won't be in the cache and the processor has to fetch it from memory. This costs multiple hundred CPU-cycles and instead of stalling the CPU starts executing code speculatively without knowing the value of b. In that case, the likely keyword indicates that we should execute the first branch. Note that the executed instructions are not visible to the outside world at that time. Modern x86 processors can execute up to 400 micro-ops speculatively and only commit the result once the prediction turned out to be true.

So to answer your question I would put the likely/unlikely keyword around the a > b.

edited Jul 27 '23 at 11:44

answered Jul 27 '23 at 11:23

Benedict Schlüter

329
4
16

... or add the keyword around both branches (assuming no compiler optimization for the simple code) – Benedict Schlüter Jul 27 '23 at 11:49
tldr: there is much more to it and the CPU will save the last branches to optimize the branch outcome on itself. I can recommend reading the initial spectre paper or the technical report from P0, specifically the section about branch prediction https://googleprojectzero.blogspot.com/2018/01/ – Benedict Schlüter Jul 27 '23 at 11:59
1

The Q&A linked by the OP has a mostly obsolete top answer. See comments on it like mine and BeeOnRope's: [How do the likely/unlikely macros in the Linux kernel work and what is their benefit?](https://stackoverflow.com/posts/comments/70654751) , and Ciro's answer: likely/unlikely influences branch *layout*, and the decision to make branchy vs. branchless asm if that's an option. **But nothing can hint the actual CPU hardware branch prediction about which is more likely.** (On modern x86 and AArch64). – Peter Cordes Jul 27 '23 at 17:15
1

This answer would make sense for PowerPC and some MIPS where machine code can contain branch hints for the CPU. And Pentium 4, the only x86 microarchitecture to support branch hints. ([Is it possible to tell the branch predictor how likely it is to follow the branch?](https://stackoverflow.com/a/1851445)). Older x86 like P4 and P6 may also have static prediction influenced by branch layout, but since Sandybridge or Haswell (IT-TAGE predictors) no. [Why did Intel change the static branch prediction mechanism over these years?](https://stackoverflow.com/q/51822731) – Peter Cordes Jul 27 '23 at 17:19
Thanks for pointing it out. Indeed, there is no static branch prediction in x86 as the p0 blogpost points out. However, for infrequently used branches there is probably no branch information available (we found that around 80 branches are enough to confuse the predictor https://github.com/benschlueter/CVE-2021-33624/blob/156b67497b63eb34d3afee24e7cfa3f3f522f559/bpf_spectre_type_confusion.c#L553). If we hit such a branch it is up to the micro-arch to decide what to do. Either there is some type of 'hashing' for the address and we rely on branch information from another address or .... – Benedict Schlüter Jul 28 '23 at 07:22
we can just fall back to a default case (i.e. always not take or always take the branch). I implicitly assumed that the latter one is the case for x86, which might/probably not be true. – Benedict Schlüter Jul 28 '23 at 07:25
1

I just looked up the spec for the A-65 (https://developer.arm.com/documentation/100439/latest), p 78 indicates that static branch prediction is still used, so it might be possible that my assumption was correct in the case no dynamic information is available. – Benedict Schlüter Jul 28 '23 at 07:37
1

Yes, true, some non-x86 CPUs may use the classic BTNF (backward taken, forward not-taken) as a fallback when a dynamic prediction isn't available. I think IT-TAGE (in x86, found in Haswell / Zen 2 and later) will always make a direction prediction for conditional direct branches, though; it doesn't try to figure out whether the indexed entry is "for that branch" or not: branches can alias each other, and one branch's state can be distributed over many entries if it's reached with many different branch histories leading up to it. (Making a target address prediction is harder, but separate.) – Peter Cordes Jul 28 '23 at 07:49
Since your answer talks about the cold case for the data, it's not unreasonable that might happen when predictors are cold for the branch, too. Maybe worth emphasizing that's the only case when `[[likely]]` / `[[unlikely]]` will affect actual branch prediction by CPU hardware outside of a few ISAs with branch hints, not in hot cases like inside loops. (And even then, some modern microarchitectures like recent x86 don't have static BTFN at all.) – Peter Cordes Jul 28 '23 at 07:50

Where the likely/unlikely statement should be placed for more performance?

1 Answers1