for-loop performance oddity in .NET x64: even-number-iteration affinity?

Question

Running an empty for-loop with large numbers of iterations, I'm getting wildly different numbers in how long it takes to run:

public static class Program
{
    static void Main()
    {
        var sw = new Stopwatch();
        sw.Start();
        for (var i = 0; i < 1000000000; ++i)
        {
        }
        sw.Stop();
        Console.WriteLine(sw.ElapsedMilliseconds);
    }
}

The above will run in around 200ms on my machine, but if I increase it to 1000000001, then it takes 4x as long! Then if I make it 1000000002, then it's down to 200ms again!

This seems to happen for an even number of iterations. If I go for (var i = 1; i < 1000000001, (note starting at 1 instead of 0) then it's 200ms. Or if I do i <= 1000000001 (note less than or equal) then it's 200ms. Or (var i = 0; i < 2000000000; i += 2) as well.

This appears only to be on x64, but on all .NET versions up to (at least) 4.0. Also it appears only when in release mode with debugger detached.

UPDATE I was thinking that this was likely due to some clever bit shifting in the jit, but the following seems to disprove that: if you do something like create an object inside that loop, then that takes about 4x as long too:

public static class Program
{
    static void Main()
    {
        var sw = new Stopwatch();
        sw.Start();
        object o = null;
        for (var i = 0; i < 1000000000; i++)
        {
            o = new object();
        }
        sw.Stop();
        Console.WriteLine(o); // use o so the compiler won't optimize it out
        Console.WriteLine(sw.ElapsedMilliseconds);
    }
}

This takes around 1 second on my machine, but then increasing by 1 to 1000000001 it takes 4 seconds. That's an extra 3000ms, so it couldn't really be due to bit shifting, as that would have shown up as a 3000ms difference in the original problem too.

Perhaps it unrolls two iterations of the loop if the limit is even and then realizes that the result of the first half of the iteration is never used and optimizes it out. — CodesInChaos, Aug 10 '13 at 21:36

Esailija · Accepted Answer · 2013-08-10T21:47:14.573

6

Well here are the disassemblies:

00000031  xor         eax,eax 
  for (var i = 0; i < 1000000001; ++i)
00000033  inc         eax           
00000035  cmp         eax,3B9ACA01h 
0000003a  jl          0000000000000033 
0000003c  movzx       eax,byte ptr [rbx+18h] 
00000040  test        eax,eax 
00000042  je          0000000000000073

And

00000031  xor         eax,eax 
     for (var i = 0; i < 1000000000; ++i)
00000033  add         eax,4 
00000036  cmp         eax,3B9ACA00h 
0000003b  jl          0000000000000033 
0000003d  movzx       eax,byte ptr [rbx+18h] 
00000041  test        eax,eax 
00000043  je          0000000000000074

The only difference I see is that in the even loop, the loop index is incremented by 4 at a time (add eax 4) instead of 1 at a time (inc eax) so it finishes the loop 4x faster because of that.

This is just speculation but I believe it is unrolling the loop by a factor of 4. So it places the body 4 times inside the loop and just increments 4 times faster. But because the body is empty, empty body times 4 is still empty, you gain much bigger gain than you would expect from loop unrolling.

edited Aug 10 '13 at 21:47

answered Aug 10 '13 at 21:40

Esailija

138,174
23
272
326

How do you view the disassembly? – lobsterism Aug 10 '13 at 21:46
2

stackoverflow.com/questions/3423547/how-can-i-view-the-disassembly-of-optimised-jitted-net-code – Esailija Aug 10 '13 at 21:47
3

Yes, this is loop unrolling at work. It is more visible in [this answer](http://stackoverflow.com/a/2057228/17034) about that optimization going bad. A better optimizer splits this in two sections, one that's unrolled and another that takes care of the last few iterations. But the jitter optimizer doesn't have enough time to work down the outliers. – Hans Passant Aug 10 '13 at 21:55
@HansPassant If it's smart enough to do loop unrolling, why would it not be smart enough just to eliminate the empty loop completely? – lobsterism Aug 10 '13 at 22:41
1

Well generally it doesn't make sense to write code in the compiler to do optimizations that are not useful to anyone in a real situation. – Esailija Aug 10 '13 at 22:49
1

@lob - yes, that's a big difference between a C or C++ compiler and .NET. The code optimizer in a native compiler just assumes that stupid code emerged from, say, the preprocessor and just whacks a do-nothing loop. Not uncommon. In .NET, the jitter optimizer assumes that the programmer *intended* to waste time and that it should not overrule that intention. It hasn't yet been proven wrong. – Hans Passant Aug 10 '13 at 22:55
Okay makes sense. Since the optimizer would have to check for this during runtime, it'd just make all real apps run a tiny bit slower, for no real benefit. Not to mention who at MS would pay for the developer time to design, develop, test, and maintain this feature when there are millions of other features people are begging for. Unlike loop unrolling, which is a common speedup for *real* applications. – lobsterism Aug 10 '13 at 23:07

for-loop performance oddity in .NET x64: even-number-iteration affinity?

1 Answers1

Linked