3

Recently I realized I have been doing too much branching without caring the negative impact on performance it had, therefore I have made up my mind to attempt to learn all about not branching. And here is a more extreme case, in attempt to make the code to have as little branch as possible.

Hence for the code

if(expression) 
  A = C;       //A and C have to be the same type here obviously

expression can be A == B, or Q<=B, it could be anything that resolve to true or false, or i would like to think of it in term of the result being 1 or 0 here

I have come up with this non branching version

A += (expression)*(C-A);   //Edited with thanks

So my question would be, is this a good solution that maximize efficiency? If yes why and if not why?

  • 1
    What are the types of A, B, and C supposed to be here? Can you multiply C by a boolean? – npr Jun 06 '13 at 01:08
  • Assume them all to be integer for this purpose. Thanks! – ChangYou Wong Jun 06 '13 at 01:21
  • First, you should think harder about the structure of your code to see if conditions can be simplified, because every conditional block is another path you have to test in combination with every other. After that, for simple cases you should first check to see if there are compiler switches (eg., `-mcpu`, `-mtune` in GCC) which would encourage the compiler to do this automatically. If you _must_ hand-code a branchless conditional, you might be better off with `A ^= -(A==B)&(C^B);`, but you're still betting on the compiler making a branchless evaluation of `A==B` internally. – sh1 Jun 06 '13 at 01:41
  • Surely you could factor it as `A += (A==B)*(C-A)` instead? – tmyklebu Jun 06 '13 at 02:39
  • Oh thanks for that improvement! Totally missed it somehow – ChangYou Wong Jun 06 '13 at 03:38
  • thanks sh1 for the lengthy explanation which I will look through all of them. – ChangYou Wong Jun 06 '13 at 03:47
  • [Bad branch prediction can really slow things down](http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array). Though I'm not sure what putting it in an arithmetic expression will do. If it doesn't do much, converting it to an arithmetic operation rather than a boolean one (if possible) **may** make a significant difference. – Bernhard Barker Jun 06 '13 at 07:49
  • Thanks Dukeling for that! – ChangYou Wong Jun 07 '13 at 04:55

4 Answers4

6

Depends on the compiler, instruction set, optimizer, etc. When you use a boolean expression as an int value, e.g., (A == B) * C, the compiler has to do the compare, and the set some register to 0 or 1 based on the result. Some instruction sets might not have any way to do that other than branching. Generally speaking, it's better to write simple, straightforward code and let the optimizer figure it out, or find a different algorithm that branches less.

Lee Daniel Crocker
  • 12,927
  • 3
  • 29
  • 55
  • Compilers aren't necessarily very good at doing these optimisations, though. Whether you want the branchy code or the hacky code depends a lot on how much performance you need and how the code gets used. – tmyklebu Jun 06 '13 at 02:40
2

Jeez, no, don't do that!

Anyone who "penalize[s] [you] a lot for branching" would hopefully send you packing for using something that awful.

How is it awful, let me count the ways:

  1. There's no guarantee you can multiply a quantity (e.g., C) by a boolean value (e.g., (A==B) yields true or false). Some languages will, some won't.
  2. Anyone casually reading it is going observe a calculation, not an assignment statement.
  3. You're replacing a comparison, and a conditional branch with two comparisons, two multiplications, a subtraction, and an addition. Seriously non-optimal.
  4. It only works for integral numeric quantities. Try this with a wide variety of floating point numbers, or with an object, and if you're really lucky it will be rejected by the compiler/interpreter/whatever.
Ross Patterson
  • 9,527
  • 33
  • 48
0

You should only ever consider doing this if you had analyzed the runtime properties of the program and determined that there is a frequent branch misprediction here, and that this is causing an actual performance problem. It makes the code much less clear, and its not obvious that it would be any faster in general (this is something you would also have to measure, under the circumstances you are interested in).

npr
  • 126
  • 1
  • 10
0

After doing research, I came to the conclusion that when there are bottleneck, it would be good to include timed profiler, as these kind of codes are usually not portable and are mainly used for optimization.

An exact example I had after reading the following question below

Why is it faster to process a sorted array than an unsorted array?

I tested my code on C++ using that, that my implementation was actually slower due to the extra arithmetics.

HOWEVER! For this case below

if(expression)     //branched version
  A += C; 
//OR
A += (expression)*(C); //non-branching version

The timing was as of such. Branched Sorted list was approximately 2seconds.

Branched unsorted list was aproximately 10 seconds.

My implementation (whether sorted or unsorted) are both 3seconds.

This goes to show that in an unsorted area of bottleneck, when we have a trivial branching that can be simply replaced by a single multiplication.

It is probably more worthwhile to consider the implementation that I have suggested. ** Once again it is mainly for the areas that is deemed as the bottleneck **

Community
  • 1
  • 1
  • It is worthwhile to note that this is mainly my conclusion after testing on C++ on Visual Studio environment. Therefore timed-profiler should be always used when doing such optimization – ChangYou Wong Jun 07 '13 at 21:51