19

C++20 introduced the attributes [[likely]] and [[unlikely]] to the language, which can be used to allow the compiler to optimize for the case where one execution path is either much more likely or much less likely than the other ones.

Given the cost of an incorrect branch prediction, this seems like a feature that's potentially extremely useful in performance-critical sections of code, but I don't know what it would actually cause the compiler to do.

Is there a simple piece of code for which adding [[likely]] and [[unlikely]] attributes changes the assembly output by the compiler? And perhaps more importantly, what do these changes do?

I created a simple example for my own understanding to see if there was any difference in the assembly, but it appears that this example is too simple to actually show any changes to the assembly:

void true_path();
void false_path();

void foo(int i) {
    if(i) {
        true_path();
    } else {
        false_path();
    }
}
void bar(int i) {
    if(i) [[likely]] {
        true_path();
    } else [[unlikely]] {
        false_path();
    }
}

View the compiled assembly here.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Alecto Irene Perez
  • 10,321
  • 23
  • 46
  • 1
    Aside: the attributes are a terrible misnomer, they’re *not* for optimising a more likely code path. In fact, the branch predictor already does that, no need for attributes or intrinsics. Instead, the attributes are for telling the branch predictor to *disregard* the more likely branch, and that the performance of the less likely branch is *more important*. – Konrad Rudolph Jun 05 '20 at 18:47
  • 1
    I'd recommend watching this video: https://www.youtube.com/watch?v=ew3wt0g99kg. – Thomas Jager Jun 05 '20 at 18:49
  • I guess that on the x86 platform they refer to the "Branch hint" prefixes `2Eh` (Branch not taken (used only with Jcc instructions)) and `3Eh` (—Branch taken (used only with Jcc instructions)) from Vol 2A, Chapter 2.1.1 of the current Intel manual. Intel's comment on these: *"Some earlier microarchitectures used these as branch hints, but recent generations have not and they are reserved for future hint usage."* – zx485 Jun 05 '20 at 18:59
  • 4
    @KonradRudolph: "Branch prediction" in the usual sense is a run-time thing. These hints are *compile-time* hints to the compiler on how it might want to lay out the branches, or decide whether to do if-conversion to something branchless like `cmov`. It's only overriding anything if the compiler was using profile-guided optimization based on actual runtime data. (And in that case IIRC gcc uses the PGO data, ignoring hints). – Peter Cordes Jun 05 '20 at 19:11
  • Related: [gcc optimization flag -O3 makes code slower than -O2](https://stackoverflow.com/q/28875325) is a similar thing, where compiler options and/or PGO make a difference between branchless vs. branchy, in a case where branchy is actually better because of data with more of a pattern than GCC would guess by default. – Peter Cordes Jun 05 '20 at 19:14
  • @PeterCordes: in this simple example, https://godbolt.org/z/iHWQdH, gcc compiles different code, if you switch `likely` to `unlikely`. The funny thing is, if you duplicate that function, and one of them has `likely`, and the other `unlikely`, the assembly will be the same for both functions (the attribute of the first function will affect the second function, weird). It's like quantummechanics :) – geza Jun 05 '20 at 19:21
  • 2
    @geza: heh, that's fun. GCC's identical-code-folding inter-procedural-analysis optimizations (ipa-icf) happen before generation of final asm, perhaps at a GIMPLE level. I remember finding a bug with different inline asm constraints in different functions being folded together, which was actually a correctness problem. – Peter Cordes Jun 05 '20 at 19:24
  • @PeterCordes Are you saying that the C++ attributes are different from intrinsics (GCC’s `__builtin_expect`)? Or are you saying that the latter are *also* not used to override PGO? Because that was my understanding, based also on the (admittedly, somewhat vague) GCC documentation. – Konrad Rudolph Jun 05 '20 at 19:27
  • 1
    @KonradRudolph: I assume they're identical, and that `[[likely]]` / `[[unlikely]]` were added to ISO C++ to expose that existing compiler behaviour in a portable way. And yes, I'd guess that GCC probably ignores either way of writing it when PGO data is available. – Peter Cordes Jun 05 '20 at 19:31
  • 1
    @KonradRudolph: How can a branch predictor disregard a branch? What do you mean by this? How can the performance of the less likely branch be more important? – geza Jun 05 '20 at 19:33
  • @geza … disregard the *frequency of taking a branch* for the purpose of prefetch. But apparently I was wrong. — As for your second question, this is by all accounts pretty rare; I have never knowingly encountered such a situation. My point is that in the opposite case, where you want the likely branch to be faster, the branch predictor already does this for you, and a `[[likely]]` annotation is going to make virtually *no* difference (or it might actually make performance worse). – Konrad Rudolph Jun 05 '20 at 21:35
  • @rustyx: yes, but most ISAs don't have static branch hints in asm. On an ISA that did have them, you'd probably expect `[[likely]]` to compile to an asm hint, like for Pentium 4 when Intel experimented with that for x86. And I think MIPS has some branch-likely instructions, and PowerPC has something. But on modern x86, and ARM/AArch64, all you can do at compile time is lay out branches so the fast path is the not-taken one. See [Is it possible to tell the branch predictor how likely it is to follow the branch?](https://stackoverflow.com/a/1851445) – Peter Cordes Jun 06 '20 at 04:29

1 Answers1

7

As it seems, there is a bug in gcc. If you have two functions which are the same, besides [[likely]] attributes, gcc folds them incorrectly.

But if you use just one function, and switch between [[likely]]/[[unlikely]], assembly changes.

So, this function:

void bar(int i) {
    if(i) [[unlikely]] {
        true_path();
    } else [[likely]] {
        false_path();
    }
}

compiles to:

bar(int):
        test    edi, edi
        jne     .L4
        jmp     false_path()
.L4:
        jmp     true_path()

And this:

void bar(int i) {
    if(i) [[likely]] {
        true_path();
    } else [[unlikely]] {
        false_path();
    }
}

compiles to:

bar(int):
        test    edi, edi
        je      .L2
        jmp     true_path()
.L2:
        jmp     false_path()

Notice, that the condition has changed: the first version jumps, if i is non-zero, while the second one jumps if i is zero.

This is in agreement with the attributes: gcc generates code, where the conditional jump happens in the unlikely path.

geza
  • 28,403
  • 6
  • 61
  • 135
  • Do you happen to know why `jne` is used here in place of `je`, and what the effect is? – Alecto Irene Perez Jun 05 '20 at 21:04
  • 1
    @J.AntonioPerez: The compiler tries to optimize, so the CPU doesn't have to take a branch. I suppose that it does it, because a correctly-predicted-and-not-taken jump is cheaper than a correctly-predicted-and-taken branch. Or maybe it does this way, because of cache usage: cold path is put at the end of the function, so the hot path has a continuous path, so cache is a little bit better utilized. But I'm not an expert on this, if you're curious maybe you want to ask this specific question, tagged with [tag:assembly] (and maybe [tag:x86]) labels. – geza Jun 05 '20 at 21:58
  • @J.AntonioPerez: There is some information about this already: https://stackoverflow.com/a/109721/8157187, read the comments under the answer (Peter Cordes and BeeOnRope's comments in particular). – geza Jun 05 '20 at 22:11
  • 1
    @J.AntonioPerez: Both the factors geza cited are advantages for making the expected fast-path the fall-through. With larger blocks of code, letting one line of I-cache maybe stay cold is more plausible. Also note that traditionally, (before IT-TAGE branch predictors), branches where a prediction couldn't be found were statically predicted as not-taken for forward jumps, taken for backward. (IT-TAGE (Haswell and later) always indexes a dynamic prediction so there is no static prediction. The indexed prediction might be mostly primed for a different branch, but that's the tradeoff with TAGE) – Peter Cordes Jun 06 '20 at 01:03