13

I am writing a critical piece of code with roughly the following logic

if(expression is true){
   //do something with extremely low latency before the nuke blows up. This branch is entered rarely, but it is the most important case
}else{
   //do unimportant thing that doesnt really matter
}

I am thinking to use likely() macro around the expression, so when it hits the important branch, I get minimum latency.

My question is that the usage is really opposite of the macro name suggest because I am picking the unlikely branch to be pre-fetch, i.e., the important branch is unlikely to happen but it is the most critical thing when it happens.

Is there a clear downside of doing this in terms of performance?

JFMR
  • 23,265
  • 4
  • 52
  • 76
leon
  • 4,931
  • 7
  • 39
  • 37
  • 3
    You are not picking any branch to be *prefetch*, you are only tagging the code so that the compiler will optimize the *likely* case, which might include generating code with better locality for that branch, but there is no explicit *prefetch*. – David Rodríguez - dribeas Jun 06 '12 at 21:35
  • 2
    `#define likely(x) __builtin_expect((x),1)` seems to be the one you want, what do you mean that the usage is oposite of its name ? – nos Jun 06 '12 at 21:37
  • 2
    @nos The important branch is **unlikely** to happen but it is the most critical thing when it happens. – leon Jun 06 '12 at 21:38
  • 2
    I hope I don't live downwind. Trusting a compiler optimizer to come up with a good enough optimization to beat a deadline is a lot of trust. – bmargulies Jun 06 '12 at 21:43
  • 1
    Are you sure that it could make a measurable difference? – Jonathan Leffler Jun 06 '12 at 21:52
  • @JonathanLeffler The nuke is sensitive to microsecond, so I think anything matter at this point. – leon Jun 06 '12 at 21:57
  • @JonathanLeffler: He doesn't need to be sure it could make a measurable difference; as long as he's not sure it _can't_ make a measurable difference, it's worth testing to see. (Presumably if the tests show no difference, he won't bother going forward with this line of optimization; he's just asking what to do if it _does_ turn out to be worth pursuing.) – abarnert Jun 06 '12 at 22:15
  • By optimizing the unlikely case, aren't you spending more time in your, presumably blocking, more likely case and thus increasing your latency when the reactor does eventually go critical? – starbolin Jun 06 '12 at 22:24

2 Answers2

9

Yes. You are tricking the compiler by tagging the unlikely-but-must-be-fast branch as if it were the likely branch, in hopes that the compiler will make it faster.

There is a clear downside in doing that—if you don't write a good comment that explains what you're doing and why, some maintainer (possibly you yourself) in six months is almost guaranteed to say, "Hey, looks like he put the likely on the wrong branch" and "fix" it.

There is also a much less likely but still possible downside, that some version of some compiler that you use now or in the future will do different things than you're expecting with the likely macro, and those different things will not be what you wanted to trick the compiler into doing, and you'll end up with code that, every time through the loop, spends $100K speculatively getting 90% of the way through reactor shutdown before undoing it.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 5
    A better choice of macro name might be 'critical', especially if it is actually a nuclear reaction involved. Or maybe 'must_be_performed_fast'. Since you're using a macro, the macro name can be indicative of why you need it; the implementation is hidden behind the macro name. – Jonathan Leffler Jun 06 '12 at 21:54
  • @JonathanLeffler: Very good idea. That way, you can put the scare-comment inside the implementation of the macro, instead of at each use of it. And, more importantly, you can change all uses of it at once. – abarnert Jun 06 '12 at 22:02
4

It's absolutely opposite of the traditional use of __builtin_expect(x, 1), which is used in the sense of the macro:

#define likely(x) __builtin_expect(x, 1)

which I would personally consider to be bad form (since you're cryptically marking the unlikely path as likely for a performance gain). However, you still could mark this optimization, as __builtin_expect(x) makes no assumptions about your needs by claiming a path "likey" - that's just the standard use.To do what you want, I'd suggest:

#define optimize_path(x) __builtin_expect(x, 1)

which will do the same thing, but rather than making the code accuse the unlikely path of being likely, you're now making the code describe what you're really attempting -- to optimize the critical path.

However, I should say that if you're planning on timing a nuke - you should not only be hand checking (and timing) the compiled assembly so that the timing is correct, but you should also be using a RTOS. A branch misprediction will have an extraordinarily insignificant effect, to the point that it's almost unnecessary here, since you can compensate for the "1 in a million" event by simply having a faster processor or correctly timing the delay for a mispredict. What does affect modern computer timings is OS preemption and scheduling. If you need something to happen on a very discrete timescale, you should be scheduling them for real-time, not psuedo-real time that most general purpose operating systems have. Branch misprediction is generally hundreds of times smaller than the delay that can occur from not using RTOS in an RT situation. Typically if you believe branch misprediction might be a problem, you remove the branch from time-sensitive issue, as the branch predictor typically has a state that is complex and out of your control. Macro's like "likely" and "unlikely" are for blocks of code that can be hit from various areas, with various branch prediction states, and most importantly are used very frequently. The high frequency of hitting these branches leads to a tangible increase in performance for applications that use it (like the Linux Kernel). If you only hit the branch once, you might get a 1 nanosecond performance boost in some cases, but if an application is ever that time critical, there are other things you can do to help yourself to much larger increases in performance.

DavidJFelix
  • 728
  • 1
  • 6
  • 22