How far does GCC's __builtin_expect go?

Question

While answering another question I got curious about this. I'm well aware that

if( __builtin_expect( !!a, 0 ) ) {
    // not likely
} else {
    // quite likely
}

will make the "quite likely" branch faster (in general) by doing something along the lines of hinting to the processor / changing the assembly code order / some kind of magic. (if anyone can clarify that magic that would also be great).

But does this work for a) inline ifs, b) variables and c) values other than 0 and 1? i.e. will

__builtin_expect( !!a, 0 ) ? /* unlikely */ : /* likely */;

or

int x = __builtin_expect( t / 10, 7 );
if( x == 7 ) {
    // likely
} else {
    // unlikely
}

or

if( __builtin_expect( a, 3 ) ) {
    // likely
    // uh-oh, what happens if a is 2?
} else {
    // unlikely
}

have any effect? And does all of this depend on the architecture being targeted?

Tony Delroy · Answer 1 · 2023-02-22T23:28:03.127

Did you read the GCC documentation?

Built-in Function: long __builtin_expect (long exp, long c)

You may use __builtin_expect to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.

The return value is the value of exp, which should be an integral expression. The semantics of the built-in are that it is expected that exp == c. For example:
if (__builtin_expect (x, 0))
    foo ();
indicates that we do not expect to call foo, since we expect x to be zero. Since you are limited to integral expressions for exp, you should use constructions such as
if (__builtin_expect (ptr != NULL, 1))
    foo (*ptr);
when testing pointer or floating-point values.

To explain this a bit... __builtin_expect is specifically useful for communicating which branch you think the program's likely to take. You ask how the compiler can use that insight - well, consider this code:

if (x == 0)
    return 10 * y;
else
    return 39;

In machine code, the CPU can typically be asked to "goto" another line (which takes time, and depending on the CPU may prevent other execution optimisations - i.e. beneath the level of machine code - for example, see the Branches heading under http://en.wikipedia.org/wiki/Instruction_pipeline), or to call some other code, but there's not really an if/else concept where both true and false code are equal... you have to branch away to find the code for one or the other. The way that's done is basically, in pseudo-code:

test whether x is 0
if it was goto else_return_39
return 10 * y
else_return_39:
return 39

Given most CPUs are slower following the goto down to the else_return_39: label than to just fall through to return 10 * y, code for the "true" branch will be reached faster than for the false branch. Of course, the machine code could test whether x is not 0, put the "false" code (return 39) first and thereby reverse the performance characteristics.

This is what __builtin_expect controls - you can tell the compiler to put the true or the false branch where less branching is needed to reach it, thereby getting a tiny performance boost.

But does this work for a) inline ifs, b) variables and c) values other than 0 and 1?

a) Whether the surrounding function is inlined or not doesn't change the need for branching where the if statement appears (unless the optimiser sees the condition the if statement tests is always true or false and only one branch could never run). So, it's equally applicable to inlined code.

[ Your comment shows you were interested in conditional expressions - a ? b : c - I'm not sure - there's a disputed answer to that question at https://stackoverflow.com/questions/14784481/can-i-use-gccs-builtin-expect-with-ternary-operator-in-c that might prove insightful one way or the other, or the basis for further exploration ]

b) variables - you postulated:

int x = __builtin_expect( t / 10, 7 );
if( x == 7 ) {

That won't work - the compiler's not obliged to associate such expectations with variables and remember them the next time an if is seen. You can verify this (as I did for gcc 3.4.4) using gcc -S to produce assembly language output: the assembly doesn't change regardless of the expected value.

c) values other than 0 and 1

It works for integral (long) values, so yes. The last paragraph of the documentation quoted above address this, specifically:

you should use constructions such as
if (__builtin_expect (ptr != NULL, 1))
    foo (*ptr);
when testing pointer or floating-point values.

Why? Well, if the pointer type is larger than long, then calling __builtin_conversion(long, long) would effectively slice off some of the less-significant bits for the test, and fail to incorporate the (high-order) rest in the test. Similarly, floating point values might be larger than a long, and the conversion not produce the result you expect. By using a boolean expression such as ptr != NULL (given true converts to 1 and false to 0) you're sure to get intended results.

Yes, I did read the documentation (of course!). As I say I already know the gist. You misunderstood my (a) part; I meant the `a?b:c` *syntax* when I talk about "inline if" (does it have a better name? a lot of people use ternary operator but that's just it's form, not name). Also for (c) I meant (as I put in the example) other `long` values, rather than other data types. Really I'm wondering why it's a long in the first place, instead of an `int` or `char` or `_Bool`. — Dave, Mar 18 '13 at 01:56
@Dave: C++11 Standard/5.16 uses "conditional operator" to refer to `?` specifically, and "conditional expressions" generally. Re (c) - I reread your example and see your concern, while I've no more information on the the `(a, 3)` / `a == 2` scenario than you would have, I can imagine that the reason the arguments are `long` is only to avoid the need to add machine code instructions just to shorten or convert the argument: for example, if it were bool then compiler may need to ensure non-zero values were changed to exactly 1. Similarly, for `char` it may need to `&` with 255. — Tony Delroy, Mar 18 '13 at 02:09
That sounds believable. And I'll have to remember "conditional operator"! — Dave, Mar 18 '13 at 02:15
@Dave: edit above re conditional expressions, incorporating answer from http://stackoverflow.com/questions/14784481/can-i-use-gccs-builtin-expect-with-ternary-operator-in-c - cheers. — Tony Delroy, Mar 18 '13 at 02:24
I saw that answer before but wasn't happy with his methodology; it doesn't say anything about actually including branch prediction in the output, just a useless function call which is optimised out at higher O values. It would have been better if he'd compiled it with both 1 and 0 as expected values, and looked for differences at high optimisation. — Dave, Mar 18 '13 at 02:29
@Dave: fair enough - can't say I looked that closely. Maybe the code can be picked up and allow quick testing as you suggest...? — Tony Delroy, Mar 18 '13 at 02:59
Static branch prediction and the default format of instructions which support it normally favor forward conditional branch not taken. Thus the if part of if..else would be favored until you use built-in_expect or similar mechanism. A related technique is to make the usual or shorter branch the If branch. Backward conditional branch normally favors branch taken in order to start up loops faster. — tim18, May 21 '16 at 21:41

Drew Dormann · Accepted Answer · 2022-01-12T20:40:46.403

19

But does this work for a) inline ifs, b) variables and c) values other than 0 and 1?

It works for an expression context that is used to determine branching.

So, a) Yes. b) No. c) Yes.

And does all of this depend on the architecture being targeted?

Yes!

It leverages architectures that use instruction pipelining, which allow a CPU to begin working on upcoming instructions before the current instruction has been completed.

(if anyone can clarify that magic that would also be great).

("Branch prediction" complicates this description, so I'm intentionally omitting it)

Any code resembling an if statement implies that an expression may result in the CPU jumping to a different location in the program. These jumps invalidate what's in the CPU's instruction pipeline.

__builtin_expect allows (without guarantee) gcc to try to assemble the code so the likely scenario involves fewer jumps than the alternate.

edited Jan 12 '22 at 20:40

answered Mar 18 '13 at 01:28

Drew Dormann

59,987
13
123
180

OK, that's a pretty clear answer to the main questions, and that last bit is pretty much what I thought. Can you shed any light on what might happen in the last example if `a` is 2? – Dave Mar 18 '13 at 01:35
@Dave: You win, even though you shouldn't. `__builtin_expect` can never build incorrect code, just different code. The different code will be better in some cases and worse in others. You want the most common case to be the case in which the code is better, but it may also be better in other cases too. – David Schwartz Mar 18 '13 at 01:37
@DavidSchwartz but does it pretend it was written as if it was `__builtin_expect(!!a,1)`, or perform the same regardless, or even perform slower than it would without the `__builtin_expect`. It's a pointless question but I'm interested. As I said in the comments to the other answer, my main curiosity is why they chose `long` at all, rather than `int`, `char` or `_Bool` – Dave Mar 18 '13 at 02:01
@Dave: In your example, all it can do is optimize one branch of the `if` over the other. One could construct cases where it can do more than that, but current builds of `gcc` won't actually do any more than that anyway. They designed it to be flexible to allow `gcc` to be smarter in the future. – David Schwartz Mar 18 '13 at 02:02
1

Note the double negation on the `a` in `if( __builtin_expect( !!a, 0 ) )``. This confused me - `!!a == a` so why the double negation?? The reason is `__builtin_expect` requires integer arguments but what if you are using `__builtin_expect` in a macro and caller of the macro supplies a pointer or a float. By using the `!!` it allows direct use of non-integer arguments. The first ! turns the non-integer into a 0 or 1, e.g. turns pointer NULL into integer 1. The second ! fixes up the logic, ie turns that 1 back to a 0, which is logically what we expect NULL to be mapped to. – tullaman Jul 04 '14 at 08:47
@tullaman I would expect the `builtin` to act like any other C/++ boolean context and treat 0 as false, anything else as true - without requiring any contrived acrobatics like that. Or does it not act the same and somehow break on input that isn't guaranteed to be an integer 0 or 1? I doubt it. – underscore_d Dec 04 '16 at 03:49
1

@underscore_d The gcc manual [link](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) stipulates that "The return value is the value of exp, which should be an integral expression" and then goes on to say to use `ptr != NULL` to evaluate ptrs. Hence when use of the builtin is intermediated by a macro the macro usually uses the !! trick to cover all eventualities for the macro args. That's my explanation for the presence of the !! at any rate. But I'm open to other theories. – tullaman Oct 01 '20 at 19:49

How far does GCC's __builtin_expect go?

2 Answers2

Linked