8

Is < cheaper (faster) than <=, and similarly, is > cheaper (faster) than >=?

Disclaimer: I know I could measure but that will be on my machine only and I am not sure if the answer could be "implementation specific" or something like that.

Richard J. Ross III
  • 55,009
  • 24
  • 135
  • 201
Dimitar Slavchev
  • 1,597
  • 3
  • 16
  • 20
  • 13
    I would guess they both compile to a single instruction in most architectures, but that the answer is: Who cares? – user229044 Aug 01 '12 at 16:17
  • 1
    They are almost equivalent in terms of assembly instructions generated, if that's what you ask. – Richard J. Ross III Aug 01 '12 at 16:18
  • 9
    I understand the spirit behind your question, but: do you ask this out of academic interest, or because you think this might have an impact on performance of your app? It won't. The difference, if present, will be absolutely **swamped** by other factors in your app. Not by a factor of 2, or 10, but by 1m or more. I'll bet you wouldn't be able to measure it at all. – Michael Petrotta Aug 01 '12 at 16:19
  • Even better than measuring, you could look at the generated assembly and find that it is compiled to instructions that either have no reason to have different timings or, in the case of < and >= or > and <=, are identical. – Pascal Cuoq Aug 01 '12 at 16:20
  • Following Richard's comment, it means that you have to look at the assembly instruction set of the architecture, and check whether such operations can be done with the same clock cycles. – nhahtdh Aug 01 '12 at 16:20
  • 2
    I would argue that this question is on-topic and constructive. While it may not be answerable for all architectures, it show be possible to provide some benchmarks to answer the question. – Richard J. Ross III Aug 01 '12 at 16:21
  • 1
    I think that if you are programming at a level where this becomes significant, then you will have already worked it out. I have my doubts that you will be able to measure this. – paddy Aug 01 '12 at 16:21
  • 3
    Just by thinking about this question you spent more time than the answer can ever save you ;-) – René Kolařík Aug 01 '12 at 16:30
  • On what micro-architecture? Macro fusion makes this a non-issue, even implicitly comparing to zero (ie using the z flag from an arithmetic operation) isn't any faster any more - the fused compare/branch takes just as much time as a branch would (which is essentially none, unless mispredicted). – harold Aug 01 '12 at 17:03
  • On x86 all four tests are performed using the same `cmpl` instruction that subtracts its arguments and are hence equally fast. – Hristo Iliev Aug 01 '12 at 18:40
  • if by `c` you might mean also `c++` then there is a non-performance related reason to prefer `<` operator as many standard algorithms in stl do: for [LessThan Comparable](http://www.sgi.com/tech/stl/LessThanComparable.html) types only `operator<` is fundamental; the other inequality operators are essentially syntactic sugar. – jfs Aug 01 '12 at 18:56
  • Exact duplicate: [Is < faster than <=?](http://stackoverflow.com/questions/12135518/is-faster-than) – Brent Bradburn Oct 14 '12 at 01:43

2 Answers2

13

TL;DR

There appears to be little-to-no difference between the four operators, as they all perform in about the same time for me (may be different on different systems!). So, when in doubt, just use the operator that makes the most sense for the situation (especially when messing with C++).

So, without further ado, here is the long explanation:

Assuming integer comparison:

As far as assembly generated, the results are platform dependent. On my computer (Apple LLVM Compiler 4.0, x86_64), the results (generated assembly is as follows):

a < b (uses 'setl'):

movl    $10, -8(%rbp)
movl    $15, -12(%rbp)
movl    -8(%rbp), %eax
cmpl    -12(%rbp), %eax
setl    %cl
andb    $1, %cl
movzbl  %cl, %eax
popq    %rbp
ret

a <= b (uses 'setle'):

movl    $10, -8(%rbp)
movl    $15, -12(%rbp)
movl    -8(%rbp), %eax
cmpl    -12(%rbp), %eax
setle   %cl
andb    $1, %cl
movzbl  %cl, %eax
popq    %rbp
ret

a > b (uses 'setg'):

movl    $10, -8(%rbp)
movl    $15, -12(%rbp)
movl    -8(%rbp), %eax
cmpl    -12(%rbp), %eax
setg    %cl
andb    $1, %cl
movzbl  %cl, %eax
popq    %rbp
ret

a >= b (uses 'setge'): 

movl    $10, -8(%rbp)
movl    $15, -12(%rbp)
movl    -8(%rbp), %eax
cmpl    -12(%rbp), %eax
setge   %cl
andb    $1, %cl
movzbl  %cl, %eax
popq    %rbp
ret

Which isn't really telling me much. So, we skip to a benchmark:

And ladies & gentlemen, the results are in, I created the following test program (I am aware that 'clock' isn't the best way to calculate results like this, but it'll have to do for now).

#include <time.h>
#include <stdio.h>

#define ITERS 100000000

int v = 0;

void testL()
{
    clock_t start = clock();
    
    v = 0;
    
    for (int i = 0; i < ITERS; i++) {
        v = i < v;
    }
    
    printf("%s: %lu\n", __FUNCTION__, clock() - start);
}

void testLE()
{
    clock_t start = clock();
    
    v = 0;
    
    for (int i = 0; i < ITERS; i++)
    {
        v = i <= v;
    }
    
    printf("%s: %lu\n", __FUNCTION__, clock() - start);
}

void testG()
{
    clock_t start = clock();
    
    v = 0;
    
    for (int i = 0; i < ITERS; i++) {
        v = i > v;
    }
    
    printf("%s: %lu\n", __FUNCTION__, clock() - start);
}

void testGE()
{
    clock_t start = clock();
    
    v = 0;
    
    for (int i = 0; i < ITERS; i++) {
        v = i >= v;
    }
    
    printf("%s: %lu\n", __FUNCTION__, clock() - start);
}

int main()
{
    testL();
    testLE();
    testG();
    testGE();
}

Which, on my machine (compiled with -O0), gives me this (5 separate runs):

testL: 337848
testLE: 338237
testG: 337888
testGE: 337787

testL: 337768
testLE: 338110
testG: 337406
testGE: 337926

testL: 338958
testLE: 338948
testG: 337705
testGE: 337829

testL: 339805
testLE: 339634
testG: 337413
testGE: 337900

testL: 340490
testLE: 339030
testG: 337298
testGE: 337593

I would argue that the differences between these operators are minor at best, and don't hold much weight in a modern computing world.

enter image description here

enter image description here

Boken
  • 4,825
  • 10
  • 32
  • 42
Richard J. Ross III
  • 55,009
  • 24
  • 135
  • 201
  • 4
    The assembly code was actually telling a lot. It was telling that all those snippets must take exactly the same time, all circumstances being equal. `setcc` takes 1 cycle (except on P4, where it takes 3) regardless of what the `cc` is. But how is this even relevant? Comparison operators are almost never used this way - comparing the performance of `jcc` (also the same regardless of `cc`) seems more logical. – harold Aug 01 '12 at 17:15
  • 1
    I second @harold. The assembly is telling a lot - all comparisons are done using the same `cmpl` instruction that does the heavy lifting of comparing its arguments. Basically it subtracts the second argument from the first one (discarding the result) and the ALU sets the corresponding bits in the flags register. These can then be tested, brached upon or used to set memory/reg values. – Hristo Iliev Aug 01 '12 at 18:17
  • 1
    @harold He probably meant "it isn't telling _me_ much" – Gunther Piez Aug 01 '12 at 18:43
  • 1
    @drhirsch yes, that's what I meant, updated the answer to reflect that. – Richard J. Ross III Aug 01 '12 at 18:44
3

it varies, first start at examining different instruction sets and how how the compilers use those instruction sets. Take the openrisc 32 for example, which is clearly mips inspired but does conditionals differently. For the or32 there are compare and set flag instructions, compare these two registers if less than or equal unsigned then set the flag, compare these two registers if equal set the flag. Then there are two conditional branch instructions branch on flag set and branch on flag clear. The compiler has to follow one of these paths, but less, than, less than or equal, greater than, etc are all going to use the same number of instructions, same execution time for a conditional branch and same execution time for not doing the conditional branch.

Now it is definitely going to be true for most architectures that performing the branch takes longer than not performing the branch because of having to flush and re-fill the pipe. Some do branch prediction, etc to help with that problem.

Now some architectures the size of the instruction may vary, compare gpr0 and gpr1 vs compare gpr0 and the immediate number 1234, may require a larger instruction, you will see this a lot with x86 for example. so although both cases may be a branch if less than how you encode the less depending on what registers happen to hold what values can make a performance difference (sure x86 does a lot of pipelining, lots of caching, etc to make up for these issues). Another similar example is mips and or32, where r0 is always a zero, it is not really a general purpose register, if you write to it it doesnt change, it is hardwired to a zero, so a compare if equal to 0 MIGHT cost you more than a compare if equal to some other number if an extra instruction or two is required to fill a gpr with that immediate so that the compare can happen, worst case is having to evict a register to the stack or memory, to free up the register to put the immediate in there so that the compare can happen.

Some architectures have conditional execution like arm, for the full arm (not thumb) instructions you can on a per instruction basis execute, so if you had code

if(i==7) j=5; else j=9;

the pseudo code for arm would be

cmp i,#7
moveq j,#5
movne j,#7

there is no actual branch, so no pipeline issues you flywheel right on through, very fast.

One architecture to another if that is an interesting comparison some as mentioned, mips, or32, you have to specifically perform some sort of instruction for the comparision, others like x86, msp430 and the vast majority each alu operation changes the flags, arm and the like change flags if you tell it to change flags otherwise dont as shown above. so a

while(--len)
{
  //do something
}

loop the subtract of 1 also sets the flags, if the stuff in the loop was simple enough you could make the whole thing conditional, so you save on separate compare and branch instructions and you save in the pipeline penalty. Mips solves this a little by compare and branch are one instruction, and they execute one instruction after the branch to save a little in the pipe.

The general answer is that you will not see a difference, the number of instructions, execuition time, etc are the same for the various conditionals. special cases like small immediates vs big immediates, etc may have an effect for corner cases, or the compiler may simply choose to do it all differently depending on what comparison you do. If you try to re-write your algorithm to have it give the same answer but use a less than instead of a greater than and equal, you could be changing the code enough to get a different instruction stream. Likewise if you perform too simple of a performance test, the compiler can/will optimize out the comparison complete and just generate the results, which could vary depending on your test code causing different execution. The key to all of this is disassemble the things you want to compare and see how the instructions differ. That will tell you if you should expect to see any execution differences.

old_timer
  • 69,149
  • 8
  • 89
  • 168