Why would < be slower than <=? [C]

Question

Naturally, I've assumed the < and <= operators run at the same speed (per Jonathon Reinhart's logic, here). Recently, I decided to test that assumption, and I was a little surprised by my results.

I know, for most modern hardware, this question is purely academic, so had to write test programs that looped about 1 billion times (to get any minuscule difference to add up to more acceptable levels). The programs were as basic as possible (to cut out all possible sources of interference).

lt.c:

int main() {
    for (int i = 0; i < 1000000001; i++);

    return 0;
}

le.c:

int main() {
    for (int i = 0; i <= 1000000000; i++);

    return 0;
}

They were compiled and run on a Linux VirtualBox 3.19.0-18-generic #18-Ubuntu x86_64 installation, using GCC with the -std=c11 flag set.

The average time for lt.c's binary was:

real    0m2.404s
user    0m2.389s
sys 0m0.000s

The average time for le.c was:

real    0m2.397s
user    0m2.384s
sys 0m0.000s

The difference is small, but I couldn't get it to go away or reverse no matter how many times I ran the binaries.

I made the comparison value in the for-loop of lt.c one larger than le.c (so they'd both loop the same number of times). Was this somehow a mistake?
According the answer in Is < faster than <=?, < compiles to jge and <= compiles to jg. That was dealing with if statements rather than a for-loop, but could this still be the reason? Could the execution of jge take slightly longer than jg? (I think this would be ironic since that would mean moving from C to ASM inverts which one is the more complicated instruction, with lt in C translating to gte in ASM and lte to gt.)
Or, is this just so hardware specific that different x86 lines or individual chips may consistently show the reverse trend, the same trend, or no difference?

You might ask the compiler to give you the assembler code and examine that. — Fred Larson, Jul 15 '15 at 15:27
Both codes compile to the same assembly (both with `jle`), even with -O0 (gcc 4.9.2). The difference is probably uncontrolled artifact. — ElderBug, Jul 15 '15 at 15:29
Using `-O3` the compiler should have optimized away the loops, not using optimization is meaningless. — Shafik Yaghmour, Jul 15 '15 at 15:34
And as a general matter, a difference <3% is already doubtful, and here you are talking about a difference <0.3%. — ElderBug, Jul 15 '15 at 15:34
See: [Loop with a zero execution time](http://stackoverflow.com/q/26771692/1708801) — Shafik Yaghmour, Jul 15 '15 at 15:41
@ElderBug 1) As I said, the difference was roughly the same no matter how many times I ran the tests. 2) Small differences are valid for large enough sample sizes. 3) If you still think that difference is still too small, at 2 billion loops, lt.c takes 4.790s real / 4.772s user, and le.c takes 4.746s real / 4.728s. The trend only grows as I increase the number of loops. — CircleSquared, Jul 15 '15 at 16:00
These should produce code with identical runtimes (assuming the compiler isn't just optimizing the loops away entirely). What *might* make them faster is to reverse the loop count: make it count down to zero. Comparisons with zero are faster on many processors than comparison with a constant. — Lee Daniel Crocker, Jul 15 '15 at 16:08
You should definitely post assembly code. You can of course compare them yourself, too. For me, similarly as for @ElderBug, the generated code is **identical** in both cases, so it must run in the same time. See also at [gcc explorer](http://goo.gl/vqgZKH) - both functions have exactly the same assembly code (except for label names). — Jester, Jul 15 '15 at 16:11
Well, I guess that 'JGE label' needs to test two flags, JG only one? — Martin James, Jul 15 '15 at 16:17
I get identical assembly as well. Don't assume that `<` and `<=` compile to `jge` and `jg` - they don't on my system (both versions used `jbe`). Do note that `jg` tests the overflow, sign, and zero flags (`OF=SF` and `ZF=0`), while `jge` only tests the overflow and sign flags (`OF=SF`). That *may* be a potential source of difference in runtime over many, many repetitions. — John Bode, Jul 15 '15 at 16:17
those are dead loops, you cant measure the difference that way and if you can then counting to 10 is as good as counting to a billion. as you have figured out the instruction set covers the difference. The measurable difference between those instructions is far less than one clock cycle and basically not measurable at all. If you see a difference it is due to measurement inaccuracies or something other than the code differences. — old_timer, Jul 15 '15 at 16:59

score 1 · Accepted Answer · answered Jul 15 '15 at 18:42

There were a few requests in the comments to my question to include the assembly being generated for me by GCC. After getting to compiler to pop out the assembly versions of each file, I checked it.

Result:
It turns out the default optimization setting turned both for-loops into the same assembly. Both files were identical in assembly-form, actually. (diff confirmed this.)

Possible reason for the previously observed time difference:
It seems the order in which I ran the binaries was the cause for the run time difference.

On a given runthrough, the programs generally were executed quicker with each successive execution, before plateauing after about 3 executions.
I alternated back and forth between time ./lt and time ./le, so the one run first would have a bias towards extra time in its average.
I usually ran lt first.
I did several separate runthroughs (increasing the averaged bias).

Code excerpt:

        movl    $0, -4(%rbp)
        jmp     .L2
.L3:
        addl    $1, -4($rbp)
.L2
        cmpl    $1000000000, -4(%rbp)
        jle     .L3
        mol     $0, %eax
        pop     %rbp

... * covers face * ...carry on....

Now you know more about the CPU instruction cache mechanism and what it does :) — Cobusve, Jul 16 '15 at 08:51

score 0 · Answer 2 · answered Jul 15 '15 at 17:50

Let's speak in assembly. (depends on the architecture of course) When comparing you'll use cmp or test instruction and then - when you use < the equal instruction would be jl which checks if SF and OF are not the same (some special flags called sign and overflow) - when you use <= the equal instruction is jle which checks not only SF != OF but also ZF == 1 (zero flag) and so one, more here but honestly it's not even the whole cycle so...I think the difference is unmeasurable under normal circumstances

Why would < be slower than <=? [C]

2 Answers2