0

I was reading the wikipedia page on optimization: http://en.wikibooks.org/wiki/Optimizing_C%2B%2B/Code_optimization/Pipeline and I came across the line:

For pipelined processors, comparisons are slower than differences, because they imply a branch.

Why is it the case that comparisons imply a branch? For example if:

int i = 2;
int x = i<5;

Is there a branch in this? It makes sense to me to branch for if statements with conditionals but I don't understand why a comparison alone causes a branch.

aaron-prindle
  • 3,077
  • 1
  • 17
  • 15
  • 3
    I think it is because comparisons (and subsequent branches) affect the branch predictor http://en.wikipedia.org/wiki/Branch_predictor . The branch predictor speeds up execution in pipelined processors when the prediction is good, and slows down the execution when it fails (the pipeline has to be filled again). – siritinga Jan 14 '15 at 16:07
  • 1
    Unlike forum sites, we don't use "Thanks", or "Any help appreciated", or signatures on [so]. See "[Should 'Hi', 'thanks,' taglines, and salutations be removed from posts?](http://meta.stackexchange.com/questions/2950/should-hi-thanks-taglines-and-salutations-be-removed-from-posts). – John Saunders Jan 14 '15 at 16:23
  • 1
    Questions about whether particular questions are suitable should be asked on [meta]. As always, do some searching before asking a new one. – John Saunders Jan 14 '15 at 16:24

4 Answers4

3

This involves only a single branch:

unsigned(i – min_i) <= unsigned(max_i – min_i)

While this involves two:

min_i <= i && i <= max_i

When a CPU encounters a branch, it consults its predictor and follows the most likely path. If the prediction is correct, the branch is essentially free in terms of performance. If the prediction is wrong, the CPU needs to flush the pipeline and start all over.

This kind of optimization is a double-edged sword --- if your branches are highly predictable, the first might actually run slower than the second. It entirely depends on how much you know about your data.

Cory Nelson
  • 29,236
  • 5
  • 72
  • 110
  • Neither necessarily involve a branch; a compiler could easily optimize either to eliminate all branching, at least on most architectures. – James Kanze Jan 14 '15 at 18:07
  • Agreed, depends on how it is being used, which wasn't specified. – Cory Nelson Jan 14 '15 at 18:09
  • Yes. If the results aren't used, the compiler may eliminate the test entirely. If they are assigned to a `bool` (which is later used), it would be a relatively trivial optimization to avoid branching in either case. Even if they control an `if`, depending on what is in the controlled block, it may be possible to avoid the branch (but that may require a fairly sophisticated compiler). Note too that `min_i <= i && i <= max_i` has no side effects, so the compiler doesn't really have to short circuit, which could easily eliminate the need for the extra branch in the second expression. – James Kanze Jan 14 '15 at 18:20
3

Preamble: Modern compilers are capable of eliminating branches in various ways. Thus, none of the examples necessarily results a branch in the final (assembler or machine) code.

So why does the logic basically imply branches?

The code

bool check_interval_branch(int const i, int const min_i, int const max_i)
{
  return min_i <= i && i <= max_i;
} 

can be logically rewritten to be:

bool check_interval_branch(int const i, int const min_i, int const max_i)
{
  if (min_i <= i) 
  { 
    if (i < max_i) return true; 
  }
  return false;
} 

Here you obviously have two branches (where the second one is only executed if the first one is true - short circuit) which can be mispredicted by the branch predictor which in turn leads to a reroll of the pipeline.

Visual Studio 2013 (with optimization turned one) generates the following assembly containing two branches for check_interval_branch:

  push  ebp
  mov   ebp, esp
  mov   eax, DWORD PTR _i$[ebp]
  cmp   DWORD PTR _min_i$[ebp], eax    // comparison
  jg    SHORT $LN3@check_inte          // conditional jump
  cmp   eax, DWORD PTR _max_i$[ebp]    // comparison
  jg    SHORT $LN3@check_inte          // conditional jump
  mov   al, 1
  pop   ebp
  ret   0
$LN3@check_inte:
  xor   al, al
  pop   ebp
  ret   0

The code

bool check_interval_diff(int const i, int const min_i, int const max_i)
{
  return unsigned(i - min_i) <= unsigned(max_i - min_i);
}

is logically identical to

bool check_interval_diff(int const i, int const min_i, int const max_i)
{
  if (unsigned(i – min_i) <= unsigned(max_i – min_i)) { return true; }
  return false;
}

which contains only a single branch but executes two differences.

The generated code for check_interval_diff of Visual Studio 2013 doesn't even contain a conditional jump.

  push  ebp
  mov   ebp, esp
  mov   edx, DWORD PTR _i$[ebp]
  mov   eax, DWORD PTR _max_i$[ebp]
  sub   eax, DWORD PTR _min_i$[ebp]
  sub   edx, DWORD PTR _min_i$[ebp]
  cmp   eax, edx                    // comparison
  sbb   eax, eax
  inc   eax
  pop   ebp
  ret   0

(The trick here is that the subtraction done by sbb is different by 1 according to the carry flag which in turn is set to 1 or 0 by the cmp instruction.)

In fact you see three differences here (2x sub, 1x sbb).

It probably depends on your data / the use case which one is faster.

See Mysticals answer here about branch prediction.

Your code int x = i<5; is logically identical to

int x = false;
if (i < 5)
{
  x = true;
}

which again contains a branch (x = true only executed if i < 5.)

Community
  • 1
  • 1
Pixelchemist
  • 24,090
  • 7
  • 47
  • 71
  • 1
    Even without optimization, it would be a very poor compiler which generated a branch instruction for `int x = i < 5;`. – James Kanze Jan 14 '15 at 18:12
  • @JamesKanze I think the question is more about the logic behin the example expressions, rather than about the question whether the code produced by the compiler will omit the branches in any way or not. I tend to say: Comparisons imply a branch which can -in some (or many) cases- be optimized out by decent compilers. – Pixelchemist Jan 14 '15 at 19:29
3

While the answers given here are good, not all comparisons are translated into branch instructions (they do introduce data dependencies which may also cost you some performance).

For example, the following C code

int main()
{
    volatile int i;
    int x = i<5;

    return x;
}

Is compiled by gcc (x86-64, Optimizations enabled) into:

    movl    -4(%rbp), %eax
    cmpl    $5, %eax
    setl    %al
    movzbl  %al, %eax

The setl instruction sets the value of AL according to the result of the comparison instruction preceding it.

Of course, this is a very simple example -- and the cmp/setl combination probably introduces dependencies that prevent the processor for executing them in parallel or may even cost you a few cycles.

Still, on a modern processor, not all comparisons are turned into branch instructions.

nimrodm
  • 23,081
  • 7
  • 58
  • 59
  • I think the compiler (not the processor) is the one that is in authority which comparison is turned into a branch but in general you're correct. There are cases where an optimizing compiler would generate branchless code even if the logic contains branches. – Pixelchemist Jan 14 '15 at 17:07
  • 1
    @Pixelchemist I think his point is that most modern processors have instructions (like the `setl`) which can be used to avoid the branch. Of course, even without that, things like `sbb %al,%al` can be used to get the carry bit into a register. (The original Intel PL/M compilers for the 8086 did this, back in 1980, so it doesn't seem to much to expect it from modern compilers.) – James Kanze Jan 14 '15 at 18:11
1

Who ever wrote that page is not competent as a programmer. First, comparisons don't necessarily imply a branch; it depends on what you do with them. And whether that implies a branch or not depends on the processor and the compiler. An if generally requires a branch, but even then, a good optimizer can sometimes avoid it. A while or a for will generally require a branch, unless the compiler can unwind the loop, but that branch is highly predictable, so even when branch prediction is an issue, it may not matter.

More generally, if you worry about anything at this level when writing your code, you're wasting your time, and making maintenance far more difficult. The only time you should be concerned is once you have a performance problem, and the profiler shows that this is the spot where you're loosing performance. At that point, you can experiment with several different ways of writing the code, to determine which one will result in faster code for your combination of compiler and hardware. (Change the compiler or the hardware, and it might not be the same one.)

James Kanze
  • 150,581
  • 18
  • 184
  • 329