9

I know the kernel uses the likely and unlikely macros prodigiously. The docs for the macros are located at Built-in Function: long __builtin_expect (long exp, long c). But they don't really discuss the details.

How exactly does a compiler handle likely(x) and __builtin_expect((x),1)?

Is it handled by the code generator or the optimizer?

Does it depend upon optimization levels?

What's an example of the code generated?

jww
  • 97,681
  • 90
  • 411
  • 885
  • why don't you check the generated code? note: you won't see anything special. if you understand what those "macros" do, it should be clear why... – Karoly Horvath Jun 19 '14 at 08:01
  • 1
    That's quite the loaded question, but essentially the machine code generated by the compiler will have some minor target-specific changes made to it to make the branch predictor & caches behave in a certain way. You'd honestly be best off looking at Intel's processor documentation as for the specifics of this, at least for x86. – Richard J. Ross III Jun 19 '14 at 08:01
  • @jww: it's not a conversation. – Karoly Horvath Jun 19 '14 at 08:15
  • 4
    It's quite useful to learn to read generated assembler from compilers to understand what is going on. Especially when you're interested in nanooptimizations like this. – Art Jun 19 '14 at 08:26
  • 1
    Yeah, seriously what Art said. I'm learning x86 & ARM assembly much more than I ever thought I would for this reason. – Daniel Santos Jul 30 '14 at 23:38

1 Answers1

13

I just tested a simple example on gcc.

For x86 this seems to be handled by the optimizer and depend on optimization levels. Although I guess a correct answer here would be "it depends on the compiler".

The code generated is CPU dependent. Some cpus (sparc64 comes immediately to my mind, but I'm sure there are others) have flags on conditional branch instructions that tell the CPU how to predict it, so the compiler generates "predict true/predict false" instructions depending on the built in rules in the compiler and hints from the code (like __builtin_expect).

Intel documents their behavior here: https://software.intel.com/en-us/articles/branch-and-loop-reorganization-to-prevent-mispredicts . In short the behavior on Intel CPUs is that if the CPU has no previous information about a branch it will predict forward branches as unlikely to be taken, while backwards branches are likely to be taken (think about loops vs. error handling).

This is some example code:

int bar(int);
int
foo(int x)
{
    if (__builtin_expect(x>10, PREDICTION))
        return bar(10);
    return 42;
}

Compiled with (I'm using omit-frame-pointer to make the output more readable, but I still cleaned it up below):

$ cc -S -fomit-frame-pointer -O0 -DPREDICTION=0 -o 00.s foo.c
$ cc -S -fomit-frame-pointer -O0 -DPREDICTION=1 -o 01.s foo.c
$ cc -S -fomit-frame-pointer -O2 -DPREDICTION=0 -o 20.s foo.c
$ cc -S -fomit-frame-pointer -O2 -DPREDICTION=1 -o 21.s foo.c

There's no difference between 00.s and 01.s, so that shows that this is dependent on optimization (for gcc at least).

Here's the (cleaned up) generated code for 20.s:

foo:
    cmpl    $10, %edi
    jg  .L2
    movl    $42, %eax
    ret
.L2:
    movl    $10, %edi
    jmp bar

And here is 21.s:

foo:
    cmpl    $10, %edi
    jle .L6
    movl    $10, %edi
    jmp bar
.L6:
    movl    $42, %eax
    ret

As expected the compiler rearranged the code so that the branch we don't expect to take is done in a forward branch.

Art
  • 19,807
  • 1
  • 34
  • 60
  • 1
    Thank you Art. I did not know about the forward/backward branch "hint" in x86! Also, when your "cold" code is at the end of the function (in the generated code) it is possible that it never gets loaded into the CPU's cache unless it's needed, which is a good thing. Oh yeah jww, learn about the `cold` and `hot` attributes as well. I believe that `include/linux/compiler.h` has macros for them too. If you mark a function as such, you can omit the (un)likely macros. – Daniel Santos Jul 30 '14 at 23:41