11

I have done my homework and found repeated assurances that it makes no difference in performance whether you declare your variables inside or outside your for loop, and it actually compiles to the very same MSIL. But I have been fiddling with it nevertheless and found that moving the variable declarations inside the loop does actually cause a considerable and consistent performance gain.

I have written a small console test class to measure this effect. I initialise a static double[] array items, and two methods perform loop operations on it, writing the results to a static double[] array buffer. Originally, my methods were those with which I noticed the difference, namely the magnitude calculation of a complex number. Running these for an items array of length 1000000 for 100 times, I got consistently lower run times for the one in which the variables (6 double variables) were inside the loop: eg, 32,83±0,64 ms v 43,24±0,45 ms on an elderly configuration with Intel Core 2 Duo @2.66 GHz. I tried executing them in different order, but it did not influence the results.

Then I realised that calculating the magnitude of a complex number is far from a minimum working example and tested two much simpler methods:

    static void Square1()
    {
        double x;

        for (int i = 0; i < buffer.Length; i++) {
            x = items[i];
            buffer[i] = x * x;
        }
    }


    static void Square2()
    {
        for (int i = 0; i < buffer.Length; i++) {
            double x;
            x = items[i];
            buffer[i] = x * x;
        }
    }

With these, the results came out the other way: declaring the variable outside the loop seemed more favourable: 7.07±0.43 ms for Square1() v 12.07±0.51 ms for Square2().

I am not familiar with ILDASM, but I have disassembled the two methods, and the only difference seems to be the initialisation of the local variables:

      .locals init ([0] float64 x,
       [1] int32 i,
       [2] bool CS$4$0000)

in Square1() v

      .locals init ([0] int32 i,
       [1] float64 x,
       [2] bool CS$4$0000)

in Square2(). In accordance with it, what is stloc.1 in one is stloc.0 in the other, and vice versa. In the longer complex magnitude calculation MSIL codes even the code size differed and I saw stloc.s i in the external-declaration code where there was stloc.0 in the internal-declaration code.

So how can this be? Am I overlooking something or is it a real effect? If it is, it can make a significant difference in the performance of long loops, so I think it deserves some discussion.

Your thoughts are much appreciated.

EDIT: The one thing I overlooked was to test it on several computers before posting. I have run it on an i5 now and the results are nearly identical for the two methods. My apologies for having posted such a misleading observation.

tethered.sun
  • 149
  • 3
  • 14
  • 1
    Good investigation, you sure earn an upvote. – NicoRiff Feb 09 '17 at 15:03
  • 2
    @NicoRiff: Indeed, it's a very well written question. (Sadly though I think the answer is trivial.) – Bathsheba Feb 09 '17 at 15:04
  • 1
    I can´t wait for @JonSkeet answer on this one – NicoRiff Feb 09 '17 at 15:12
  • 1
    I'm not able to replicate this behavior with the code given. The generated IL definitely flips around the order that locals are declared, but I don't see any significant performance difference. – Kyle Feb 09 '17 at 17:14
  • @Kyle: I have run the very same code at home using a different computer and here the difference disappeared. My apologies. It still intrigues me whether it is something systematic or just an artefact for a single computer. I shall perform more tests and update the post tomorrow. – tethered.sun Feb 09 '17 at 18:37
  • 1
    Can you show the code you used to measure the performance? Have you taken into account JIT compilation the first time the code is run? – Chris Dunaway Feb 09 '17 at 18:58
  • @Chris Dunaway: I have shared the full code here: https://drive.google.com/open?id=0B3OSs_9bqexqY2ZvSUE1WmJjTDQ As I ran 100 cycles in succession and took the mean and SD values, I should conjecture the JIT compilation time does not play a role here. – tethered.sun Feb 09 '17 at 19:32

2 Answers2

7

Any C# compiler worth its salt will perform such micro-optimisations for you. Only leak a variable outside a scope if it's necessary.

So keep double x; internal to the loop if possible.

Personally though, if items[i] is plain-old-data array access then I'd write buffer[i] = items[i] * items[i];. C and C++ would optimise to that, but I don't think C# does (yet); your disassembly implies that it doesn't.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • Thank you very much! I used to stick to the obsessive habit of declaring all my variable at the beginning of the method, but from now on I shall think twice. My take-home message is to test both arrangements if I care about performance because it seems that optimisation can work in both directions. – tethered.sun Feb 09 '17 at 15:07
  • 1
    The wooly answer is "years of experience tells you that loosely scoped variables ends up in a complete mess of a codebase". – Bathsheba Feb 09 '17 at 15:07
  • Your answer seems to say that there should be no performance difference between keeping a variable internal to the loop and outside, but that does not really explain the measured differences the OP experienced. – HugoRune Feb 09 '17 at 15:19
  • Facetiously I blame that solely on the concept of an elastic ruler. – Bathsheba Feb 09 '17 at 15:29
  • The C# compiler will occasionally remove local variables. I'm not sure under what conditions it manages to do it, but I have seen it do it before. – Kyle Feb 10 '17 at 04:55
1

It would be interesting to profile what the Garbage Collector does for these two variants.

I can imagine that in the first case, the variable x is not collected while the loop is running because it is declared in the outside scope.

In the second case, all handles on x will be removed on each iteration.

Maybe you run your test again with the new C# 4.6 GC.TryStartNoGCRegion and GC.EndNoGCRegion to see if the performance impact stems from the GC.

Prevent .NET Garbage collection for short period of time

Community
  • 1
  • 1
Georg Patscheider
  • 9,357
  • 1
  • 26
  • 36
  • Thank you, it is an excellent idea. I wanted to have it tested already, but at the moment I have no access to .NET 4.6. SharpDevelop does not seem to support it. I'll try to upgrade my tools and return to the question. – tethered.sun Feb 09 '17 at 15:13
  • 1
    I doubt this has anything to do with the GC. `double` is a value type and, in this case, will be stack allocated. It doesn't generate any garbage to clean up. – Kyle Feb 09 '17 at 17:07
  • 2
    Eric Lippert gives some excellent info on this at http://stackoverflow.com/a/14043763/526724 – Bradley Uffner Feb 09 '17 at 18:41