14

Write a branchless function that returns 0, 1, or 2 if the difference between two signed integers is zero, negative, or positive.

Here's a version with branching:

int Compare(int x, int y)
{
    int diff = x - y;
    if (diff == 0)
        return 0;
    else if (diff < 0)
        return 1;
    else
        return 2;
}

Here's a version that may be faster depending on compiler and processor:

int Compare(int x, int y)
{
    int diff = x - y;
    return diff == 0 ? 0 : (diff < 0 ? 1 : 2);
}

Can you come up with a faster one without branches?

SUMMARY

The 10 solutions I benchmarked had similar performance. The actual numbers and winner varied depending on compiler (icc/gcc), compiler options (e.g., -O3, -march=nocona, -fast, -xHost), and machine. Canon's solution performed well in many benchmark runs, but again the performance advantage was slight. I was surprised that in some cases some solutions were slower than the naive solution with branches.

Marc Eaddy
  • 1,762
  • 3
  • 16
  • 23
  • 7
    While it's always hard to tell the difference, this sounds like Computer Hardware/Machine Architecture homework to me. – Mike Burton Oct 23 '09 at 00:52
  • 21
    Using the ternary "?:" operator contains an implicit branch. – Bob Murphy Oct 23 '09 at 01:18
  • 7
    I don't think he implied otherwise. – Tordek Oct 23 '09 at 03:02
  • Have you actually tried the most straightforward way to compare your Point structs: `bool operator<(const Point& a, const Point& b){ return a.x < b.x || (a.x == b.x && a.y < b.y); }`? I can't see how the look-up table is going to beat that because calculating the array indices appears to be **at least as** expensive as just calculating the answer. - With a quick test this sorts 1 million random pairs some 30% faster than the look-up table and Tom's Compare function. – UncleBens Oct 23 '09 at 13:07
  • I agree. I wouldn't use a table for Point. However, it might make sense when comparing two Lines, which requires comparing 4 integers. – Marc Eaddy Oct 23 '09 at 15:46
  • **Related:** [Is there a standard sign function in C?](http://stackoverflow.com/questions/1903954/is-there-a-standard-sign-function-signum-sgn-in-c-c) – legends2k Sep 02 '14 at 13:58

11 Answers11

24

Branchless (at the language level) code that maps negative to -1, zero to 0 and positive to +1 looks as follows

int c = (n > 0) - (n < 0);

if you need a different mapping you can simply use an explicit map to remap it

const int MAP[] = { 1, 0, 2 };
int c = MAP[(n > 0) - (n < 0) + 1];

or, for the requested mapping, use some numerical trick like

int c = 2 * (n > 0) + (n < 0);

(It is obviously very easy to generate any mapping from this as long as 0 is mapped to 0. And the code is quite readable. If 0 is mapped to something else, it becomes more tricky and less readable.)

As an additinal note: comparing two integers by subtracting one from another at C language level is a flawed technique, since it is generally prone to overflow. The beauty of the above methods is that they can immedately be used for "subtractionless" comparisons, like

int c = 2 * (x > y) + (x < y);
AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • I like the sheer conciseness and readability of what you've done here. The only flaw I see is that not all languages or environments will return consistent values (or numeric values) from a comparison (i.e. you can't use this in Java). – jprete Oct 23 '09 at 15:28
  • 2
    Funny. It was my first answer, and I discarded it on the bitwise argument. – Tordek Oct 23 '09 at 21:44
  • 1
    Yes, I saw that you submitted the same thing before me. I upvoted your answer and the one from Tom right away. Now I'm trying to downvote mine, but it doesn't let me :) – AnT stands with Russia Oct 23 '09 at 21:47
  • 3
    Depending on the instruction set, this isn't necessarily branchless. – peterchen Oct 23 '09 at 23:32
  • 1
    Yes, that's what I meant when I said that it is branchless *at the language level*. – AnT stands with Russia Oct 24 '09 at 05:51
  • If you are going for performance, that multiplication by 2 could be written with bit shifts, if x and y are integers – Tom Oct 26 '09 at 03:29
  • Any compiler worth anyone's time will do that for you. – GManNickG Oct 26 '09 at 03:35
  • @Tom: Multiplication by 2 and shift 1 bit left are operations that have exactly the same semantical meaning in C/C++. For this reason, they produce identical machine code. There's absolutely no point to replace one with another. It will absolutely have no effect on performance. The choice must be made from pure readability considerations. In this case I wanted to multiply something by 2, not shift anything anywhere, so that's exactly how I spelled it out in the code: with multiplication operator and with a `2`. – AnT stands with Russia Oct 26 '09 at 05:20
  • @AudreyT: Even though it's "branchless" at the language level, anytime you use comparison operators, you are potentially introducing branches at the machine compilation level. – Adisak Oct 26 '09 at 06:22
  • 2
    @Adisak: Yes, I understand that perfectly well, which is, obviously, the reason I included the "language level" remark in my original reply. So? – AnT stands with Russia Oct 26 '09 at 06:35
  • AudreyT: "So?" Well, the reason it's important is the Branchless code is almost always harder to read than Branching code... even in the simple case you present with `int c = 2 * (x > y) + (x < y);` It's mainly used as part of an advanced programmers toolkit as a micro-optimization in heavily used critical routines or inner loops. Since your "Branchless at the language level" example emits two branches at the machine code level, it's no longer a valid optimization. Your example will run slower than a naive easy-to-read branching version. – Adisak Oct 30 '09 at 02:59
  • Firstly, it doesn't necessarily emit branches. It only *might* emit branches, and with a more-or-less decent compiler there will only be one branch, not two. Or it might not emit any branches at all. I consider the latter as much more likely, BTW. Secondly, I don't see why it should run "slower" instead of running with the same speed as the explictly branched version. – AnT stands with Russia Oct 30 '09 at 03:57
  • Thirdly, code like `cmp = (x > y) - (x < y)` is a well-known and extremely widespread comparison idiom. For even a moderately experienced programmer this version is significantly more readable than the explicitly branched version, since it is much more compact. – AnT stands with Russia Oct 30 '09 at 03:59
  • @BЈовић: It is the number that we need to map. – AnT stands with Russia Oct 29 '14 at 19:46
  • But it is not mentioned anywhere. Not in the question, and not in this answer. How is it calculated, and where are x and y from the question? – BЈовић Oct 30 '14 at 07:28
  • @BЈовић: The quesion is, as stated in the title, about "mappping zero, negative, and positive [value] to 0, 1, 2". I thing it is sufficently clear that in my answer `n` is intended to be that value. Also, once you understand the question, you realize that `x` and `y` are completely irrlelvant. If you want to use a name from the question, that name would be `diff`, not `x` and `y`. I just replaced `diff` with `n` for brevity, since the name of the variable does not really matter. `n` is basically a name for the "formal paramter" of the idiom I pressented. – AnT stands with Russia Oct 30 '14 at 17:40
  • @BЈовић: It is not clear to me why you mention these irrlevant `x` and `y`, yet completely igore the directly relevant `diff`. It is even less clear to me how you could miss a "comparison" example at the end of my answer, which actually uses `x` and `y – AnT stands with Russia Oct 30 '14 at 17:43
14
int Compare(int x, int y) {
     return (x < y) + (y < x) << 1;
}

Edit: Bitwise only? Guess < and > don't count, then?

int Compare(int x, int y) {
    int diff = x - y;
    return (!!diff) | (!!(diff & 0x80000000) << 1);
}

But there's that pesky -.

Edit: Shift the other way around.

Meh, just to try again:

int Compare(int x, int y) {
    int diff = y - x;
    return (!!diff) << ((diff >> 31) & 1);
}

But I'm guessing there's no standard ASM instruction for !!. Also, the << can be replaced with +, depending on which is faster...

Bit twiddling is fun!

Hmm, I just learned about setnz.

I haven't checked the assembler output (but I did test it a bit this time), and with a bit of luck it could save a whole instruction!:

IN THEORY. MY ASSEMBLER IS RUSTY

subl  %edi, %esi
setnz %eax
sarl  $31, %esi
andl  $1, %esi
sarl  %eax, %esi
mov   %esi, %eax
ret

Rambling is fun.

I need sleep.

Tordek
  • 10,628
  • 3
  • 36
  • 67
  • Doesn't work: Compare(5, 9) -> 3 should be 1 Compare(3, 1) -> 1 should be 2 Compare(4, 4) -> 0 (correct) Here's my version: int Compare(int x, int y) { unsigned int diff = x - y; return (!(diff >> 31) & (diff & 0x7FFFFFFF) != 0) << 1 | (diff >> 31); } Can you do better? :) – Marc Eaddy Oct 23 '09 at 05:31
  • Oops, sorry, I hadn't tested and didn't cross my mind `!!diff` would be 1 for a negative diff, for some reason. – Tordek Oct 23 '09 at 07:31
  • There's an ASM instruction for `!!` if there's an ASM instruction for `!= 0` (which I bet there is). – Chris Lutz Oct 23 '09 at 08:00
  • Why bitwise only? Generating a `bool` value from a comparison is a branchless operation these days. – AnT stands with Russia Oct 23 '09 at 08:12
  • +1 for the `setnz` trick (but I can't add twice...). Wonder whether it's an optimized instruction on recent x86 processors though. – Abel Oct 23 '09 at 11:25
  • @AndreyT: Beats me, I'm only sticking to OP's rules. – Tordek Oct 23 '09 at 16:39
10

Assuming 2s complement, arithmetic right shift, and no overflow in the subtraction:

#define SHIFT (CHARBIT*sizeof(int) - 1)

int Compare(int x, int y)
{
    int diff = x - y;
    return -(diff >> SHIFT) - (((-diff) >> SHIFT) << 1);
}
Stephen Canon
  • 103,815
  • 19
  • 183
  • 269
  • `(2 & (-diff >> 30))` looks promising, but doesn't work. I.e., if `diff=-4`, the outcome would be `0`, though `1` for negative is expected. – Abel Oct 23 '09 at 01:49
  • 3
    Tordek means that `(2 & (-diff >> 30))` is equivalent to `-(((-diff) >> 31) << 1)`, not that it replaces the entire expression. – Stephen Canon Oct 23 '09 at 04:13
  • 1
    You're also assuming sizeof(int) == 4. – Richard Corden Oct 23 '09 at 09:11
  • 1
    `#include ` and replace `31` with `CHAR_BIT * sizeof(int) - 1` – ephemient Oct 23 '09 at 14:42
  • If the explicit extraction of sign bit is the most efifcient way to check the sign of a number (as opposed to `test`-like operations and CPU state flags), then that's the code any self-respecting compiler will generate for language-level comparisons with 0. – AnT stands with Russia Oct 23 '09 at 15:58
9

Two's complement:

#include <limits.h>
#define INT_BITS (CHAR_BITS * sizeof (int))

int Compare(int x, int y) {
    int d = y - x;
    int p = (d + INT_MAX) >> (INT_BITS - 1);
    d = d >> (INT_BITS - 2);
    return (d & 2) + (p & 1);
}

Assuming a sane compiler, this will not invoke the comparison hardware of your system, nor is it using a comparison in the language. To verify: if x == y then d and p will clearly be 0 so the final result will be zero. If (x - y) > 0 then ((x - y) + INT_MAX) will set the high bit of the integer otherwise it will be unset. So p will have its lowest bit set if and only if (x - y) > 0. If (x - y) < 0 then its high bit will be set and d will set its second to lowest bit.

Paul Hsieh
  • 1,466
  • 7
  • 9
6

Unsigned Comparison that returns -1,0,1 (cmpu) is one of the cases that is tested for by the GNU SuperOptimizer.

cmpu: compare (unsigned)
int cmpu(unsigned_word v0, unsigned_word v1)
{
    return ( (v0 > v1) ? 1 : ( (v0 < v1) ? -1 : 0) );
}

A SuperOptimizer exhaustively searches the instruction space for the best possible combination of instructions that will implement a given function. It is suggested that compilers automagically replace the functions above by their superoptimized versions (although not all compilers do this). For example, in the PowerPC Compiler Writer's Guide (powerpc-cwg.pdf), the cmpu function is shown as this in Appendix D pg 204:

cmpu: compare (unsigned)
PowerPC SuperOptimized Version
subf  R5,R4,R3
subfc R6,R3,R4
subfe R7,R4,R3
subfe R8,R7,R5

That's pretty good isn't it... just four subtracts (and with carry and/or extended versions). Not to mention it is genuinely branchfree at the machine opcode level. There is probably a PC / Intel X86 equivalent sequence that is similarly short since the GNU Superoptimizer runs for X86 as well as PowerPC.

Note that Unsigned Comparison (cmpu) can be turned into Signed Comparison (cmps) on a 32-bit compare by adding 0x80000000 to both Signed inputs before passing it to cmpu.

cmps: compare (signed)
int cmps(signed_word v0, signed_word v1)
{
    signed_word offset=0x80000000;
    return ( (unsigned_word) (v0 + signed_word),
        (unsigned_word) (v1 + signed_word) );
}

This is just one option though... the SuperOptimizer may find a cmps that is shorter and does not have to add offsets and call cmpu.

To get the version that you requested that returns your values of {1,0,2} rather than {-1,0,1} use the following code which takes advantage of the SuperOptimized cmps function.

int Compare(int x, int y)
{
    static const int retvals[]={1,0,2};
    return (retvals[cmps(x,y)+1]);
}
Adisak
  • 6,708
  • 38
  • 46
4

I'm siding with Tordek's original answer:

int compare(int x, int y) {
    return (x < y) + 2*(y < x);
}

Compiling with gcc -O3 -march=pentium4 results in branch-free code that uses conditional instructions setg and setl (see this explanation of x86 instructions).

push   %ebp
mov    %esp,%ebp
mov    %eax,%ecx
xor    %eax,%eax
cmp    %edx,%ecx
setg   %al
add    %eax,%eax
cmp    %edx,%ecx
setl   %dl
movzbl %dl,%edx
add    %edx,%eax
pop    %ebp
ret 
Tom
  • 10,689
  • 4
  • 41
  • 50
4

Good god, this has haunted me.

Whatever, I think I squeezed out a last drop of performance:

int compare(int a, int b) {
    return (a != b) << (a > b);
}

Although, compiling with -O3 in GCC will give (bear with me I'm doing it from memory)

xorl  %eax, %eax
cmpl  %esi, %edi
setne %al
cmpl  %esi, %edi
setgt %dl
sall  %dl, %eax
ret

But the second comparison seems (according to a tiny bit of testing; I suck at ASM) to be redundant, leaving the small and beautiful

xorl  %eax, %eax
cmpl  %esi, %edi
setne %al
setgt %dl
sall  %dl, %eax
ret

(Sall may totally not be an ASM instruction, but I don't remember exactly)

So... if you don't mind running your benchmark once more, I'd love to hear the results (mine gave a 3% improvement, but it may be wrong).

Tordek
  • 10,628
  • 3
  • 36
  • 67
  • Why wouldn't you just add al and dl to get the value of 1 or 2 instead of using SALL (shift arithmetic left long)? Add is simpler and faster than variable shift (which may be microcoded on some CPUs). – Adisak Oct 26 '09 at 07:17
  • It's a fair option; I hadn't though of it. Only benchmarks will tell. – Tordek Oct 26 '09 at 17:04
  • ICC 11 produces (icc -static -g -O3 -fast -xHost): `[mov $0x1,%edx; xor %ecx,%ecx; xor %eax,%eax; cmp %esi,%edi; cmovne %edx,%eax; cmovg %edx,%ecx; shl %cl,%eax; retq] My benchmark indicates it is very fast. – Marc Eaddy Oct 26 '09 at 17:20
0

Combining Stephen Canon and Tordek's answers:

int Compare(int x, int y)
{
    int diff = x - y; 
    return -(diff >> 31) + (2 & (-diff >> 30));
} 

Yields: (g++ -O3)

subl     %esi,%edi 
movl     %edi,%eax
sarl     $31,%edi
negl     %eax
sarl     $30,%eax
andl     $2,%eax
subl     %edi,%eax 
ret 

Tight! However, Paul Hsieh's version has even fewer instructions:

subl     %esi,%edi
leal     0x7fffffff(%rdi),%eax
sarl     $30,%edi
andl     $2,%edi
shrl     $31,%eax
leal     (%rdi,%rax,1),%eax
ret
Marc Eaddy
  • 1,762
  • 3
  • 16
  • 23
  • 4
    Note that fewer instructions != faster. Only profiling the examples being run a few million times will tell. – Chris Lutz Oct 23 '09 at 06:32
  • 2
    I'd say that the `2 * (x >y) + (x < y)` version looks cleaner in C code, produces few instructions and embeds no immediate constants in machine code. Unfortunately, neither GCC nor MSVC yet understand that it is enough to `cmp` x and y only once, even at the highest optimization levels. – AnT stands with Russia Oct 23 '09 at 08:03
0
int Compare(int x, int y)
{
    int diff = x - y;

    int absdiff = 0x7fffffff & diff; // diff with sign bit 0
    int absdiff_not_zero = (int) (0 != udiff);

    return
        (absdiff_not_zero << 1)      // 2 iff abs(diff) > 0
        -
        ((0x80000000 & diff) >> 31); // 1 iff diff < 0
}
Alex
  • 1,249
  • 1
  • 14
  • 14
  • This will not work correctly in cases where the computation of `diff = x-y` overflows. Also, some compilers will convert the `!=` you use to compute `absdiff_not_zero` to a branch. – Adisak Oct 26 '09 at 07:22
0

The basic C answer is :

int v;           // find the absolute value of v
unsigned int r;  // the result goes here 
int const mask = v >> sizeof(int) * CHAR_BIT - 1;

r = (v + mask) ^ mask;

Also :

r = (v ^ mask) - mask;

Value of sizeof(int) is often 4 and CHAR_BIT is often 8.

0

For 32 signed integers (like in Java), try:

return 2 - ((((x >> 30) & 2) + (((x-1) >> 30) & 2))) >> 1;

where (x >> 30) & 2 returns 2 for negative numbers and 0 otherwise.

x would be the difference of the two input integers

fedorqui
  • 275,237
  • 103
  • 548
  • 598
flanglet
  • 564
  • 4
  • 11