<= does work slower than <

Question

I found a few questions on SO regarding performance comparison of < and <= (this one was extremely downvoted) and I always found the same answer that there is no performance difference between the two.

I wrote a program for the comparison (not so working fiddle...copy to your machine to run it) in which I created two loops for (int i = 0; i <= 1000000000; i++ ) and for (int i = 0; i < 1000000001; i++ ) in two different methods.

I ran each method 100 times; took an average of the elapsed time and found that loop with <= operator ran slower than the one with < operator.

I ran the program multiple times and <= always took more time to complete. My results(im ms) were:

3018.73, 2778.22

2816.87, 2760.62

2859.02, 2797.05

My question is: If neither one is faster, why do I see differences in the results? Is there anything wrong with my program?

Are you sure you know how to write a micro-benchmark? Have you compiled in release mode, run without debugger (CTRL+F5), setup your process to have high priority, ... And it is an empty `for` cycle... it would be probably optimized out in Release mode... so you would need to add some code inside. — xanatos, Apr 17 '15 at 14:43
Also, did you try reversing them? So that the first one is LessThan? — Oskar Kjellin, Apr 17 '15 at 14:43
I did change the priority. No, I did not check in release mode. Sure, I'll check with some code inside and get back to you. — Dumbledore, Apr 17 '15 at 14:44
Why does it matter if you have to loop one hundred million times in order to see a performance difference? Are you writing a program that loops a hundred million times and need the fastest possible performance? Otherwise this seems like a waste of time... in my opinion. Don't let me stop you though :). — Heretic Monkey, Apr 17 '15 at 14:45
How are you measuring? Did you account for JIT time? Are you using `Stopwatch` or `DateTime`? is a 2% difference in this operator going to make a big difference in your program _overall_? They give different results - _why does it matter_ if one is faster? — D Stanley, Apr 17 '15 at 14:45
@MikeMcCaughan Science! Even if it won't have any impact whatsoever on your program, how can you stand *not knowing*? — Kevin Gosse, Apr 17 '15 at 14:47
Also, make sure not to make these [benchmark mistakes](http://tech.pro/blog/1293/c-performance-benchmark-mistakes-part-one). — Corak, Apr 17 '15 at 14:47
I think he's just curious to see why he is getting different results; he's not implying that it actually matters — Feign, Apr 17 '15 at 14:47
Related, but not a duplicate, as this one refers to C++. http://stackoverflow.com/questions/12135518/is-faster-than — Cullub, Apr 17 '15 at 14:51
@KoolKiz, I most certainly can stand not knowing. I'm also fine with going to where your curiosity leads you. But I guess my own path would be to dig into learning about the C# compiler, MSIL, etc., rather than asking on Stack Overflow. But again, I'm not saying he shouldn't ask or be curious, just that in the long run, it doesn't really matter. — Heretic Monkey, Apr 17 '15 at 14:51
@Corak: i didn't know that E. Lippert doesn't work for MS anymore, thanks for the link ^^ — Tim Schmelter, Apr 17 '15 at 14:51
@TimSchmelter - since at least [2013-01-14](http://ericlippert.com/2013/01/14/first-day/). ^_^ — Corak, Apr 17 '15 at 14:57
Release Mode, using 6 blocks of code similar to ` watch.Start(); for (int i = 0; i <= loopCount; i++) { } watch.Stop(); Console.WriteLine(watch.ElapsedTicks); watch.Reset();` in the following sequence `<, <=, <, <, <=, <=` Running without the debugger. Resulted in the less than performing 3x-4x faster... I'm still looking for my mistake, Changing the loops to use `Int64` results in the loops performing the same — James, Apr 17 '15 at 15:02
This is one of those, "if that's your biggest performance problem" you should just call it a day and celebrate. The chances of this being the *actual* bottleneck are minimal. — Brian Rasmussen, Apr 17 '15 at 15:47

score 10 · Accepted Answer · answered Apr 17 '15 at 15:40

Bench-marking is a fine art. What you describe is not physically possible, the <= and < operators just generate different processor instructions that execute at the exact same speed. I altered your program slightly, running DoIt ten times and dropping two zeros from the for() loop so I wouldn't have to wait for ever:

x86 jitter:

Less Than Equal To Method Time Elapsed: 0.5
Less Than Method Time Elapsed: 0.42
Less Than Equal To Method Time Elapsed: 0.36
Less Than Method Time Elapsed: 0.46
Less Than Equal To Method Time Elapsed: 0.4
Less Than Method Time Elapsed: 0.34
Less Than Equal To Method Time Elapsed: 0.33
Less Than Method Time Elapsed: 0.35
Less Than Equal To Method Time Elapsed: 0.35
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.34
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.31
Less Than Equal To Method Time Elapsed: 0.34
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.31
Less Than Method Time Elapsed: 0.32

x64 jitter:

Less Than Equal To Method Time Elapsed: 0.44
Less Than Method Time Elapsed: 0.4
Less Than Equal To Method Time Elapsed: 0.44
Less Than Method Time Elapsed: 0.45
Less Than Equal To Method Time Elapsed: 0.36
Less Than Method Time Elapsed: 0.35
Less Than Equal To Method Time Elapsed: 0.38
Less Than Method Time Elapsed: 0.34
Less Than Equal To Method Time Elapsed: 0.33
Less Than Method Time Elapsed: 0.34
Less Than Equal To Method Time Elapsed: 0.34
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.35
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.42
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.31
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.35

The only real signal you get from this is the slow execution of the first DoIt(), also visible in your test results, that's jitter overhead. And the most important signal, it is noisy. The median value for both loops is about equal, the standard deviation is rather large.

Otherwise the kind of signal you always get when you micro-optimize, execution of code is not very deterministic. Short from .NET runtime overhead that's usually easy to eliminate, your program is not the only one that runs on your machine. It has to share the processor, just the WriteLine() call already has an affect. Executed by the conhost.exe process, runs concurrently with your test while your test code entered the next for() loop. And everything else that happens on your machine, kernel code and interrupt handlers also get their turn.

And codegen can play a role, one thing you should do for example is just swap the two calls. The processor itself is in general executes code very non-deterministically. The state of the processor caches and how much historical data was gathered by the branch prediction logic matters a great deal.

When I benchmark, I consider differences of 15% or less not statistically significant. Hunting down differences less than that is quite difficult, you have to very carefully study the machine code. Silly things like a branch target being mis-aligned or a variable not getting stored in a processor register can cause big effects in execution time. Not something you can ever fix, the jitter does not have enough knobs to tweak.

score 4 · Answer 2 · answered Apr 17 '15 at 15:42

First of all, there are many, many reasons to see variations in benchmarks, even when they're done right. Here are a few that come to mind:

Your computer is running a lot of other processes at the same time, switching things in and out of context, and so on. The operating system is constantly receiving and handling interrupts from various I/O devices, etc. All of these things can cause the computer to pause for periods of time that dwarf the running time for the actual code you're testing.
The JIT process can detect when a function has run a certain number of times, and apply additional optimizations to it based on that information. Things like loop unrolling can drastically reduce the number of jumps that the program has to make, which are significantly more expensive than typical CPU operations. Re-optimizing the instructions takes time when it first happens, and then speeds things up after that point.
Your hardware is trying to make additional optimizations, like branch prediction, to ensure that its pipeline is being used as efficiently as possible. (If it guesses correctly, it can basically pretend that it's going to do the i++ while it waits to see whether the < or <= comparison finishes, and then discard the result if it finds out it was wrong.) The impact of these optimizations depends on a lot of factors, and is not really easy to predict.

Secondly, it's actually really difficult to do benchmarking well. Here'a benchmark template that I've been using for a while now. It's not perfect, but it's pretty good at ensuring that any emerging patterns are unlikely to be based on order of execution or random chance:

/* This is a benchmarking template I use in LINQPad when I want to do a
 * quick performance test. Just give it a couple of actions to test and
 * it will give you a pretty good idea of how long they take compared
 * to one another. It's not perfect: You can expect a 3% error margin
 * under ideal circumstances. But if you're not going to improve
 * performance by more than 3%, you probably don't care anyway.*/
void Main()
{
    // Enter setup code here
    var actions = new[]
    {
        new TimedAction("control", () =>
        {
            int i = 0;
        }),
        new TimedAction("<", () =>
        {
           for (int i = 0; i < 1000001; i++)
            {}
        }),
        new TimedAction("<=", () =>
        {
           for (int i = 0; i <= 1000000; i++)
            {}
        }),
        new TimedAction(">", () =>
        {
           for (int i = 1000001; i > 0; i--)
            {}
        }),
        new TimedAction(">=", () =>
        {
           for (int i = 1000000; i >= 0; i--)
            {}
        })
    };
    const int TimesToRun = 10000; // Tweak this as necessary
    TimeActions(TimesToRun, actions);
}


#region timer helper methods
// Define other methods and classes here
public void TimeActions(int iterations, params TimedAction[] actions)
{
    Stopwatch s = new Stopwatch();
    int length = actions.Length;
    var results = new ActionResult[actions.Length];
    // Perform the actions in their initial order.
    for(int i = 0; i < length; i++)
    {
        var action = actions[i];
        var result = results[i] = new ActionResult{Message = action.Message};
        // Do a dry run to get things ramped up/cached
        result.DryRun1 = s.Time(action.Action, 10);
        result.FullRun1 = s.Time(action.Action, iterations);
    }
    // Perform the actions in reverse order.
    for(int i = length - 1; i >= 0; i--)
    {
        var action = actions[i];
        var result = results[i];
        // Do a dry run to get things ramped up/cached
        result.DryRun2 = s.Time(action.Action, 10);
        result.FullRun2 = s.Time(action.Action, iterations);
    }
    results.Dump();
}

public class ActionResult
{
    public string Message {get;set;}
    public double DryRun1 {get;set;}
    public double DryRun2 {get;set;}
    public double FullRun1 {get;set;}
    public double FullRun2 {get;set;}
}

public class TimedAction
{
    public TimedAction(string message, Action action)
    {
        Message = message;
        Action = action;
    }
    public string Message {get;private set;}
    public Action Action {get;private set;}
}

public static class StopwatchExtensions
{
    public static double Time(this Stopwatch sw, Action action, int iterations)
    {
        sw.Restart();
        for (int i = 0; i < iterations; i++)
        {
            action();
        }
        sw.Stop();

        return sw.Elapsed.TotalMilliseconds;
    }
}
#endregion

Here's the result I get when running this in LINQPad:

benchmark result

So you'll notice that there is some variation, particularly early on, but after running everything backwards and forwards enough times, there isn't a clear pattern emerging to show that one way is much faster or slower than another.

<= does work slower than <

2 Answers2