11

Suppose I have an expression which only a part of is very unlikely, but the other is statistically neutral:

if (could_be || very_improbable) {
    DoSomething();
}

Would it help the compiler in any way if I place the very improbable bit in an unlikely() macro?

if (could_be || unlikely(very_improbable)) {
    DoSomething();
}

Note: I'm not asking how the marcos work - I understand that. The question here is regarding GCC and will it be able to optimize the expression if I only hint about part of it. I also recognize that it might heavily depend on the expressions in question - I'm appealing to those who have experience with these macros.

immortal
  • 3,118
  • 20
  • 38
  • If you ask GCC (say, `-Wall -O2 -march=x86-64 -mtune=generic -S`, to generate assembly for generic AMD64/x86-64 architectures), at least GCC-5.4 says 'No'; there is no difference in the generated code. – Nominal Animal Jan 29 '17 at 16:38
  • 1
    Possible duplicate of [likely()/unlikely() macros in the Linux kernel - how do they work? What's their benefit?](http://stackoverflow.com/questions/109710/likely-unlikely-macros-in-the-linux-kernel-how-do-they-work-whats-their) – Red Riding Hood Jan 29 '17 at 16:41
  • @NominalAnimal Do you mean that you created a code snippet and compiled it? – immortal Jan 29 '17 at 17:23
  • 2
    @immortal: Yes, using `void func(int neutral, int rare) { if (neutral || rare) other(); }`. Typically `__builtin_expect(expr, 1)` (`likely(expr)`) and `__builtin_expect(expr, 0)` (`unlikely(expr)`) help with branch prediction, but in this case, we end up with a single branch (as if `if (neutral | rare)` was used), because the expressions are trivial. For complicated expressions, everything depends on the expressions themselves; in general, one should only add the macros after profiling indicates the compiler chooses consistently badly at that point. Anything else is premature optimization. – Nominal Animal Jan 29 '17 at 21:11

3 Answers3

6

Yes, it is reasonable, and compilers can and do take advantage of it in the right scenario.

In your actual example, if could_be and very_improbable are actually integral variables, there isn't going to be any point to inserting likely or unlikely macros on a sub-expression of the predicate, because what can the compiler really do about making this faster? The compiler can organize the if block differently depending on the likely outcome of the branch, but just because very_improbably is unlikely doesn't help: it still needs to generate the code to test it.

Let's take an example where the compiler can do more work:

extern int fn1();
extern int fn2();
extern int f(int x);

int test_likely(int a, int b) {
  if (likely(f(a)) && unlikely(f(b)))
    return fn1();
  return fn2();
}

Here the predicate is composed of two calls to f() with arguments, and icc produces different code for 3 out of the 4 combinations of likely and unlikely:

Code produced for likely(f(a)) && likely(f(b)):

test_likely(int, int):
        push      r15                                           #8.31
        mov       r15d, esi                                     #8.31
        call      f(int)                                         #9.7
        test      eax, eax                                      #9.7
        je        ..B1.7        # Prob 5%                       #9.7
        mov       edi, r15d                                     #9.23
        call      f(int)                                         #9.23
        test      eax, eax                                      #9.23
        je        ..B1.7        # Prob 5%                       #9.23
        pop       r15                                           #10.12
        jmp       fn1()                                       #10.12
..B1.7:                         # Preds ..B1.4 ..B1.2
        pop       r15                                           #11.10
        jmp       fn2()                                       #11.10

Here, both predicates are likely true, so icc produces straight-line code for the case where both are true, jumping out-of-line if either turns out to be false.

Code produced for unlikely(f(a)) && likely(f(b)):

test_likely(int, int):
        push      r15                                           #8.31
        mov       r15d, esi                                     #8.31
        call      f(int)                                         #9.7
        test      eax, eax                                      #9.7
        jne       ..B1.5        # Prob 5%                       #9.7
..B1.3:                         # Preds ..B1.6 ..B1.2
        pop       r15                                           #11.10
        jmp       fn2()                                       #11.10
..B1.5:                         # Preds ..B1.2
        mov       edi, r15d                                     #9.25
        call      f(int)                                         #9.25
        test      eax, eax                                      #9.25
        je        ..B1.3        # Prob 5%                       #9.25
        pop       r15                                           #10.12
        jmp       fn1()                                       #10.12

Now, the predicate is likely false, so icc produces straight-line code leading directly to a return in that case, and jumps out of line to B1.5 to continue the predicate. In that case, it expects the second call (f(b)) to be true, so it generates fall through code ending in a tail-call to fn1(). If the second call turns out false, it jumps back up to the same sequence already assembled for the fall-though case in the first jump (label B1.3).

This turns out also to be the code generated for unlikely(f(a)) && unlikely(f(b)). In this case, you could imagine the compiler changing the end of the code to put a jmp fn2() as the fall-through case, but it doesn't. It is key to notice that this would prevent re-using of the earlier sequence at B1.3 and it is also unlikely that we are even executing this code, so it seems reasonable to favor smaller code size over optimizing an already unlikely case.

Code produced for likely(f(a)) && unlikely(f(b)):

test_likely(int, int):
        push      r15                                           #8.31
        mov       r15d, esi                                     #8.31
        call      f(int)                                         #9.7
        test      eax, eax                                      #9.7
        je        ..B1.5        # Prob 5%                       #9.7
        mov       edi, r15d                                     #9.23
        call      f(int)                                         #9.23
        test      eax, eax                                      #9.23
        jne       ..B1.7        # Prob 5%                       #9.23
..B1.5:                         # Preds ..B1.4 ..B1.2
        pop       r15                                           #11.10
        jmp       fn2()                                       #11.10
..B1.7:                         # Preds ..B1.4
        pop       r15                                           #10.12
        jmp       fn1()                                       #10.12

This is similar to the first case (likely && likely), except that the expectation for the second predicate is now false, so it re-orders the blocks so that the return fn2() case is the fall-through one.

So compilers definitely can use precise likely and unlikely information, and indeed it makes sense: if you broke the above test up into two chained if statements, it's pretty obvious that separate branch hints would work, so it isn't surprising that the semantically equivalent use of && still benefits from hints.

Here are a few other notes that didn't get "full text" treatment, in case you got this far:

  • I used icc to illustrate the examples, but for this test at least both clang and gcc make the same basic optimizations (compiling 3 out of 4 cases differently).
  • One "obvious" optimization a compiler could make by knowing the probabilities of the sub-predicates is to reverse the order of the predicates. For example, if you have likely(X) && unlikely(Y), you can check condition Y first, since it's very likely to allow you shortcut checking Y1. Apparently, gcc can make this optimization for simple predicates, but I wasn't able to coax icc or clang into doing it. The gcc optimization is apparently quite fragile: it disappears if you change the predicate a bit, even though the optimization would be much better in that case.
  • Compilers are restricted from making optimizations when they can't guarantee that the transformed code will behave "as if" it were compiled directly according to the language semantics. In particular, they have limited ability to reorder operations unless they can prove the operations have no side-effects. Keep that in mind when structuring your predicates.

1Of course, this is only allowed when the compiler can see that X and Y have no side effects, and it may not be effective if Y is much more expensive to check compared to X (since any the benefit of avoiding the check to Y is overwhelmed by the high cost of additional X evaluations).

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • Thank you for the comprehensive answer! Just to round things up entirely - What optimization flags did you use/omit to get the compiler (I work with GCC) to produce the desired effect? I Guess O2 would be obvious, do you think O3 will still do it? – immortal Jan 30 '17 at 07:51
2

Yes, this may help. For example in the following example, when XXX is off, GCC will test x before unexpected y > 0 thus avoiding execution of unlikely code at runtime (Cygwin, gcc 5.4). Of course in this particular case y check has been written before x check but this isn't critical as during codegen your code may be re-shuffled by GCC in unpredictable ways anyway.

#ifdef XXX
#define __builtin_expect(x, y) (x)
#endif

void bar();

void foo(int x, int y, int z) {
  if(__builtin_expect(y > 0, 0) || x == 0)
    bar ();
}

When XXX is off (i.e. __builtin_expect is active):

foo:
  testl   %ecx, %ecx
  je      .L4
  testl   %edx, %edx
  jg      .L4
  rep ret

When XXX is defined (i.e. __builtin_expect ignored):

foo:
  testl   %edx, %edx
  jg      .L4
  testl   %ecx, %ecx
  je      .L4
  rep ret
yugr
  • 19,769
  • 3
  • 51
  • 96
  • Good catch. I couldn't find a case where `gcc` actually reordered the checks, but apparently this is one. – BeeOnRope Jan 29 '17 at 18:50
-1

Yes, C uses intelligent or guarding. So if we write

  if(a() || b())

then a is evaluated. If true, b is not evaluated. If false, of course b has to be evaluated to take the final decision.

So if a() is cheap or probable, b() expensive or improbable, it pays to put a() first. But the code is equaivalent to

  if(a())
  {
      do_it();
  }
  /* action point */
  else if(b())
  {
     do_it();
  }

Once control flow reaches "action_point" you see that telling the compiler whether the jump is likely or not may help.

Malcolm McLean
  • 6,258
  • 1
  • 17
  • 18
  • 2
    "Short-circuiting" is the proper term for "intelligent or guarding." – cadaniluk Jan 29 '17 at 17:33
  • I'd also mention that `a()` and `b()` must have no side-effects. Currently they look like function calls which generally can _not_ be reordered regardless of `__builtin_expect` annotations. – yugr Jan 29 '17 at 17:54
  • 2
    That's answering a different question: how to best order the conditions of a short-circuiting operator, and doesn't really related directly to `likely` or `unlikely` as I understand it. – BeeOnRope Jan 29 '17 at 18:48