10

Code:

 internal class Program
{
    private static void Main(string[] args)
    {
        const int iterCount = 999999999;

        var sum1 = 0;
        var sum2 = 0;

        using (new Dis())
        {
            var sw = DateTime.Now;
            for (var i = 0; i < iterCount; i++)
                sum1 += i;
            Console.WriteLine(sum1);
            Console.WriteLine(DateTime.Now - sw);
        }

        using (new Dis())
        {
            var sw = DateTime.Now;
            for (var i = 0; i < iterCount; i++)
                sum2 += i;
            Console.WriteLine(sum2);
            Console.WriteLine(DateTime.Now - sw);
        }

        Console.ReadLine();
    }

    private class Dis : IDisposable
    {
        public void Dispose(){}
    }
}

Two identical blocks in identical usings.

Output:

2051657985
00:00:00.3690996
2051657985
00:00:02.2640266

Second block takes 2.2 seconds ! But if to get rid of usings, durations became same ( ~0.3s, like first one). I've tried with .net framework 4.5 and .net core 1.1, in release, results are same.

Can anybody explain that behavior ?

IL_Agent
  • 707
  • 3
  • 16
  • 2
    Maybe try it with variables instead of the `Console.WriteLine`'s? That might be the problem. – ispiro Aug 29 '17 at 10:45
  • When I run your code in linqpad it gives me ~2.5 seconds for both code blocks. When run on ideone it gives ~0.3 seconds for both blocks: https://ideone.com/ReOpaH . – Chris Aug 29 '17 at 10:47
  • 2
    Also use a proper stopwatch. You call your datetime variable sw which implies you are wanting to use a stopwatch but then you aren't using one. – Chris Aug 29 '17 at 10:48
  • @Chris thank you, but it's strange. I guess linqpad has run code in debug. Which settings does ideone have ? I don't know. And I've used stopwatch - the result is same. – IL_Agent Aug 29 '17 at 12:04

1 Answers1

12

You have to look at the machine code that the jitter generates to see the underlying reason. Use Tools > Options > Debugging > General > untick the Suppress JIT optimization option. Switch to the Release build. Set a breakpoint on the first and second loops. When it hits use Debug > Windows > Disassembly.

You'll see the machine code for the bodies of the for loop:

                    sum1 += i;
00000035  add         esi,eax 

And:

                    sum2 += i;
000000d9  add         dword ptr [ebp-24h],eax 

Or in other words, the sum1 variable is stored in CPU register esi. But the sum2 variable is stored in memory, on the stack frame of the method. Big, big difference. Registers are very fast, memory is slow. The memory for the stack frame will be in the L1 cache, on modern machines accessing that cache has a latency of 3 cycles. The store buffer will be quickly overwhelmed with the large number of writes and that causes the processor to stall.

Finding a way to keep variables in CPU registers is one of the primary jitter optimization duties. But that has limitations, x86 in particular has few registers available. When they are all used up then the jitter has no option but using memory instead. Note that the using statement has an additional hidden local variable under the hood, that is why it had an effect.

Ideally the jitter optimizer would have made a better choice on how to allocate the registers. Using them for the loop variables (which it did) and the sum variables. An ahead-of-time compiler would get that right, having sufficient time to perform the code analysis. But a just-in-time compiler operates under strict time constraints.

Basic counter-measures are:

  • Break up the code into separate methods so a register like ESI can be re-used.
  • Remove the jitter forcing (Project > Properties > Build tab > untick "Prefer 32-bit"). x64 provides 8 additional registers.

The last bullet is effective for the legacy x64 jitter (target .NET 3.5 to use it), but not for the x64 jitter rewrite (aka RYuJIT) first made available in 4.6. A rewrite that was necessary because the legacy jitter took too much time optimizing code. Disappointing, RyuJIT does have a knack for disappointing, I think its optimizer could do a better job here.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • Thank you, looks like the answer ! interestingly, if we move declaration of sum2 (var sum2 = 0;) to place between using blocks, then registers are used in both loops. – IL_Agent Aug 29 '17 at 13:03