9

After reading many of the comments on this question, there are a couple people (here and here) that suggest that this code:

int val = 5;
int r = (0 < val) - (val < 0); // this line here

will cause branching. Unfortunately, none of them give any justification or say why it would cause branching (tristopia suggests it requires a cmove-like instruction or predication, but doesn't really say why).

Are these people right in that "a comparison used in an expression will not generate a branch" is actually myth instead of fact? (assuming you're not using some esoteric processor) If so, can you give an example?

I would've thought there wouldn't be any branching (given that there's no logical "short circuiting"), and now I'm curious.

Community
  • 1
  • 1
Cornstalks
  • 37,137
  • 18
  • 79
  • 144
  • I would imagine that an average C/C++ compiler code generator would produce a branching instruction (`BLE`, `BLT`, `BGE`, `BGT` or equivalent) when emitting code for `val < 0` expression. I am not sure if this is what these users mean, however. – Sergey Kalinichenko May 03 '13 at 00:46
  • One way to find out is to compile it and look at the generated assembly code. In this rather contrived case Visual Studio just generates a constant of 1. – Retired Ninja May 03 '13 at 00:51
  • 2
    If `val` is a function parameter (to disable evaluation at compile time), and the default compilation options are used, GCC generates a branch for `(0 < val) - (val < 0)` on x86. (In fairness, default compilation options means "no optimisations".) Remember, C++ defines "a < b" in a way that in other languages can only be expressed as "a < b ? 1 : 0", which is pretty clearly at least a *potential* branch. –  May 03 '13 at 00:54
  • 1
    x86 has instructions that can generate boolean results without branching. I suspect the answer will be very processor specific. – Mark Ransom May 03 '13 at 01:18
  • 2
    To be clear, C++ is silent on the issue. Nobody can say for certain if there definitely is or definitely is not a branch. Given a compiler and platform, however, it is. But this should be obvious. As stated, it's simply unanswerable. – GManNickG May 03 '13 at 01:37
  • @GManNickG: I'm asking if there's *any* (sane) platform/compiler for which a branch is likely. MIPS/ARM/x86(_64)/etc. All I'm looking for is one case that demonstrates that this is a realistic possibility. I certainly think that's answerable. (with bonus points to anyone who can give further insights/explanations/examples/etc). – Cornstalks May 03 '13 at 01:42

3 Answers3

5

To simplify matters, consider just one part of the expression: val < 0. Essentially, this means “if val is negative, return 1, otherwise 0”; you could also write it like this:

val < 0 ? 1 : 0

How this is translated into processor instructions depends heavily on the compiler and the target processor. The easiest way to find out is to write a simple test function, like so:

int compute(int val) {
    return val < 0 ? 1 : 0;
}

and review the assembler code that is generated by the compiler (e.g., with gcc -S -o - example.c). For my machine, it does it without branching. However, if I change it to return 5 instead of 1, there are branch instructions:

...
    cmpl    $0, -4(%rbp)
    jns     .L2
    movl    $5, %eax
    jmp     .L3
.L2:
    movl    $0, %eax
.L3:
...

So, “a comparison used in an expression will not generate a branch” is indeed a myth. (But “a comparison used in an expression will always generate a branch” isn’t true either.)


Addition in response to this extension/clarification:

I'm asking if there's any (sane) platform/compiler for which a branch is likely. MIPS/ARM/x86(_64)/etc. All I'm looking for is one case that demonstrates that this is a realistic possibility.

That depends on what you consider a “sane” platform. If the venerable 6502 CPU family is sane, I think there is no way to calculate val > 0 on it without branching. Most modern instruction sets, on the other hand, provide some type of set-on-X instruction.

(val < 0 can actually be computed without branching even on 6502, because it can be implemented as a bit shift.)

chirlu
  • 3,141
  • 2
  • 21
  • 22
  • 2
    I personally think `(val < 0) * 5;` is a more worthwhile comparison than `val < 0 ? 5 : 0;` (yes, `(val < 0)` and `val < 0 ? 1 : 0` is a tautology, but I'd expect the ternary operator to be more likely to branch than the conditional... whether or not that expectation is valid or not, I don't know). For me, g++ generates the exact same assembly for `(val < 0) * 5` and `val < 0 ? 5 : 0`. clang does not branch with either, and uses `cmove` for `val < 0 ? 5 : 0` and a `movzbl` with an `imull` for `(val < 0) * 5`. Interesting! – Cornstalks May 03 '13 at 01:32
  • It's also worth noting that clang and g++ generate the exact same code with `-O3` and do it with a shift and a bitwise and (no branching). – Cornstalks May 03 '13 at 01:35
  • Ideally, an optimizing compiler should treat `(val < 0)` and `val < 0 ? 1 : 0` the same. Of course, we don’t live in an ideal world … – chirlu May 03 '13 at 01:42
  • Added an example of a CPU that requires branching where modern ones don’t. – chirlu May 03 '13 at 02:12
  • 1
    It can be done branch-free on 6502 but it's not worth it because the 6502 is not pipelined, so branches are not expensive like on modern processors. – Raymond Chen May 03 '13 at 02:37
  • @Raymond: How can you do it? The only possibility I can think of is pushing the status word to the stack, then reading it into the accumulator and isolating the bit, finally shifting it to the LSB. That would work, but it is very … cumbersome. – chirlu May 03 '13 at 02:52
  • I'm thinking something like `sec; sbc #1; rol; lda #0; rol`. – Raymond Chen May 03 '13 at 03:43
4

Empiricism for the win:

int sign(int val) {
    return (0 < val) - (val < 0);
}

compiled with optimisations. gcc (4.7.2) produces

sign:
.LFB0:
    .cfi_startproc
    xorl    %eax, %eax
    testl   %edi, %edi
    setg    %al
    shrl    $31, %edi
    subl    %edi, %eax
    ret
    .cfi_endproc

no branch. clang (3.2):

sign:                                   # @sign
    .cfi_startproc
# BB#0:
    movl    %edi, %ecx
    shrl    $31, %ecx
    testl   %edi, %edi
    setg    %al
    movzbl  %al, %eax
    subl    %ecx, %eax
    ret

neither. (on x86_64, Core i5)

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
1

This is actually architecture-dependent. If there exists an instruction to set the value to 0/1 depending on the sign of another value, there will be no branching. If there's no such instruction, branching would be necessary.

dragonroot
  • 5,653
  • 3
  • 38
  • 63