85

Intro: I write high-performance code in C#. Yes, I know C++ would give me better optimization, but I still choose to use C#. I do not wish to debate that choice. Rather, I'd like to hear from those who, like me, are trying to write high-performance code on the .NET Framework.

Questions:

  • Why is the operator in the code below slower than the equivalent method call??
  • Why is the method passing two doubles in the code below faster than the equivalent method passing a struct that has two doubles inside? (A: older JITs optimize structs poorly)
  • Is there a way to get the .NET JIT Compiler to treat simple structs as efficiently as the members of the struct? (A: get newer JIT)

What I think I know: The original .NET JIT Compiler would not inline anything that involved a struct. Bizarre given structs should only be used where you need small value types that should be optimized like built-ins, but true. Fortunately, in .NET 3.5SP1 and .NET 2.0SP2, they made some improvements to the JIT Optimizer, including improvements to inlining, particularly for structs. (I am guessing they did that because otherwise the new Complex struct that they were introducing would have performed horribly... so the Complex team was probably pounding on the JIT Optimizer team.) So, any documentation prior to .NET 3.5 SP1 is probably not too relevant to this issue.

What my testing shows: I have verified that I do have the newer JIT Optimizer by checking that C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorwks.dll file does have version >= 3053 and so should have those improvements to the JIT optimizer. However, even with that, what my timings and looks at the disassembly both show are:

The JIT-produced code for passing a struct with two doubles is far less efficient than code that directly passes the two doubles.

The JIT-produced code for a struct method passes in 'this' far more efficiently than if you passed a struct as an argument.

The JIT still inlines better if you pass two doubles rather than passing a struct with two doubles, even with the multiplier due to being clearly in a loop.

The Timings: Actually, looking at the disassembly I realize that most of the time in the loops is just accessing the test data out of the List. The difference between the four ways of making the same calls is dramatically different if you factor out the overhead code of the loop and the accessing of the data. I get anywhere from 5x to 20x speedups for doing PlusEqual(double, double) instead of PlusEqual(Element). And 10x to 40x for doing PlusEqual(double, double) instead of operator +=. Wow. Sad.

Here's one set of timings:

Populating List<Element> took 320ms.
The PlusEqual() method took 105ms.
The 'same' += operator took 131ms.
The 'same' -= operator took 139ms.
The PlusEqual(double, double) method took 68ms.
The do nothing loop took 66ms.
The ratio of operator with constructor to method is 124%.
The ratio of operator without constructor to method is 132%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 64%.
If we remove the overhead time for the loop accessing the elements from the List...
The ratio of operator with constructor to method is 166%.
The ratio of operator without constructor to method is 187%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 5%.

The Code:

namespace OperatorVsMethod
{
  public struct Element
  {
    public double Left;
    public double Right;

    public Element(double left, double right)
    {
      this.Left = left;
      this.Right = right;
    }

    public static Element operator +(Element x, Element y)
    {
      return new Element(x.Left + y.Left, x.Right + y.Right);
    }

    public static Element operator -(Element x, Element y)
    {
      x.Left += y.Left;
      x.Right += y.Right;
      return x;
    }    

    /// <summary>
    /// Like the += operator; but faster.
    /// </summary>
    public void PlusEqual(Element that)
    {
      this.Left += that.Left;
      this.Right += that.Right;
    }    

    /// <summary>
    /// Like the += operator; but faster.
    /// </summary>
    public void PlusEqual(double thatLeft, double thatRight)
    {
      this.Left += thatLeft;
      this.Right += thatRight;
    }    
  }    

  [TestClass]
  public class UnitTest1
  {
    [TestMethod]
    public void TestMethod1()
    {
      Stopwatch stopwatch = new Stopwatch();

      // Populate a List of Elements to multiply together
      int seedSize = 4;
      List<double> doubles = new List<double>(seedSize);
      doubles.Add(2.5d);
      doubles.Add(100000d);
      doubles.Add(-0.5d);
      doubles.Add(-100002d);

      int size = 2500000 * seedSize;
      List<Element> elts = new List<Element>(size);

      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        int di = ii % seedSize;
        double d = doubles[di];
        elts.Add(new Element(d, d));
      }
      stopwatch.Stop();
      long populateMS = stopwatch.ElapsedMilliseconds;

      // Measure speed of += operator (calls ctor)
      Element operatorCtorResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        operatorCtorResult += elts[ii];
      }
      stopwatch.Stop();
      long operatorCtorMS = stopwatch.ElapsedMilliseconds;

      // Measure speed of -= operator (+= without ctor)
      Element operatorNoCtorResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        operatorNoCtorResult -= elts[ii];
      }
      stopwatch.Stop();
      long operatorNoCtorMS = stopwatch.ElapsedMilliseconds;

      // Measure speed of PlusEqual(Element) method
      Element plusEqualResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        plusEqualResult.PlusEqual(elts[ii]);
      }
      stopwatch.Stop();
      long plusEqualMS = stopwatch.ElapsedMilliseconds;

      // Measure speed of PlusEqual(double, double) method
      Element plusEqualDDResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        Element elt = elts[ii];
        plusEqualDDResult.PlusEqual(elt.Left, elt.Right);
      }
      stopwatch.Stop();
      long plusEqualDDMS = stopwatch.ElapsedMilliseconds;

      // Measure speed of doing nothing but accessing the Element
      Element doNothingResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        Element elt = elts[ii];
        double left = elt.Left;
        double right = elt.Right;
      }
      stopwatch.Stop();
      long doNothingMS = stopwatch.ElapsedMilliseconds;

      // Report results
      Assert.AreEqual(1d, operatorCtorResult.Left, "The operator += did not compute the right result!");
      Assert.AreEqual(1d, operatorNoCtorResult.Left, "The operator += did not compute the right result!");
      Assert.AreEqual(1d, plusEqualResult.Left, "The operator += did not compute the right result!");
      Assert.AreEqual(1d, plusEqualDDResult.Left, "The operator += did not compute the right result!");
      Assert.AreEqual(1d, doNothingResult.Left, "The operator += did not compute the right result!");

      // Report speeds
      Console.WriteLine("Populating List<Element> took {0}ms.", populateMS);
      Console.WriteLine("The PlusEqual() method took {0}ms.", plusEqualMS);
      Console.WriteLine("The 'same' += operator took {0}ms.", operatorCtorMS);
      Console.WriteLine("The 'same' -= operator took {0}ms.", operatorNoCtorMS);
      Console.WriteLine("The PlusEqual(double, double) method took {0}ms.", plusEqualDDMS);
      Console.WriteLine("The do nothing loop took {0}ms.", doNothingMS);

      // Compare speeds
      long percentageRatio = 100L * operatorCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * plusEqualDDMS / plusEqualMS;
      Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);

      operatorCtorMS -= doNothingMS;
      operatorNoCtorMS -= doNothingMS;
      plusEqualMS -= doNothingMS;
      plusEqualDDMS -= doNothingMS;
      Console.WriteLine("If we remove the overhead time for the loop accessing the elements from the List...");
      percentageRatio = 100L * operatorCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * plusEqualDDMS / plusEqualMS;
      Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);
    }
  }
}

The IL: (aka. what some of the above gets compiled into)

public void PlusEqual(Element that)
    {
00000000 push    ebp 
00000001 mov     ebp,esp 
00000003 push    edi 
00000004 push    esi 
00000005 push    ebx 
00000006 sub     esp,30h 
00000009 xor     eax,eax 
0000000b mov     dword ptr [ebp-10h],eax 
0000000e xor     eax,eax 
00000010 mov     dword ptr [ebp-1Ch],eax 
00000013 mov     dword ptr [ebp-3Ch],ecx 
00000016 cmp     dword ptr ds:[04C87B7Ch],0 
0000001d je     00000024 
0000001f call    753081B1 
00000024 nop       
      this.Left += that.Left;
00000025 mov     eax,dword ptr [ebp-3Ch] 
00000028 fld     qword ptr [ebp+8] 
0000002b fadd    qword ptr [eax] 
0000002d fstp    qword ptr [eax] 
      this.Right += that.Right;
0000002f mov     eax,dword ptr [ebp-3Ch] 
00000032 fld     qword ptr [ebp+10h] 
00000035 fadd    qword ptr [eax+8] 
00000038 fstp    qword ptr [eax+8] 
    }
0000003b nop       
0000003c lea     esp,[ebp-0Ch] 
0000003f pop     ebx 
00000040 pop     esi 
00000041 pop     edi 
00000042 pop     ebp 
00000043 ret     10h 
 public void PlusEqual(double thatLeft, double thatRight)
    {
00000000 push    ebp 
00000001 mov     ebp,esp 
00000003 push    edi 
00000004 push    esi 
00000005 push    ebx 
00000006 sub     esp,30h 
00000009 xor     eax,eax 
0000000b mov     dword ptr [ebp-10h],eax 
0000000e xor     eax,eax 
00000010 mov     dword ptr [ebp-1Ch],eax 
00000013 mov     dword ptr [ebp-3Ch],ecx 
00000016 cmp     dword ptr ds:[04C87B7Ch],0 
0000001d je     00000024 
0000001f call    75308159 
00000024 nop       
      this.Left += thatLeft;
00000025 mov     eax,dword ptr [ebp-3Ch] 
00000028 fld     qword ptr [ebp+10h] 
0000002b fadd    qword ptr [eax] 
0000002d fstp    qword ptr [eax] 
      this.Right += thatRight;
0000002f mov     eax,dword ptr [ebp-3Ch] 
00000032 fld     qword ptr [ebp+8] 
00000035 fadd    qword ptr [eax+8] 
00000038 fstp    qword ptr [eax+8] 
    }
0000003b nop       
0000003c lea     esp,[ebp-0Ch] 
0000003f pop     ebx 
00000040 pop     esi 
00000041 pop     edi 
00000042 pop     ebp 
00000043 ret     10h 
abatishchev
  • 98,240
  • 88
  • 296
  • 433
Brian Kennedy
  • 3,499
  • 3
  • 21
  • 27
  • 23
    Wow, this should be referenced as an example of how a good question on Stackoverflow can look like! Only the auto-generated comments could be omitted. Unfortunately I know too little to actually dive into the problem, but I really like the question! – Dennis Traub Sep 30 '11 at 20:57
  • Indeed, fantastic question. I hope we get some expert insight on this! – Adam Maras Sep 30 '11 at 21:05
  • 2
    I don't think a Unit Test is a good place to run a benchmark. – H H Sep 30 '11 at 21:10
  • 1
    Why the struct have to be faster then two doubles? In .NET struct is NEVER equal, to the sum of sizes of it's members. So by definition, it's bigger, so by definition it has to be slower on pushing on stack, then just 2 double values. If compiler will inline struct parameter in row 2 double memory, what if inside method you want to access that struct with reflection. Where runtime information linked to that struct object will be? Isn't it, or I'm missing something ? – Tigran Sep 30 '11 at 21:12
  • I agree with @HenkHolterman. Units tests are often run with the debugger attached, in order to help debug failures. But that affects JIT optimization. – Ben Voigt Sep 30 '11 at 21:14
  • 3
    @Tigran: You need sources for those claims. I think you're wrong. Only when a value type gets boxed, does metadata need to be stored with the value. In a variable with static struct type, there's no overhead. – Ben Voigt Sep 30 '11 at 21:15
  • @Tigran, I think you are missing something. In the situations you are thinking about (e.g. accessing the object with reflection), the struct would have to be boxed, which adds overhead. You can't reflect on an unboxed struct any more than you can reflect on an unboxed int. – Corey Kosak Sep 30 '11 at 21:19
  • 1
    I was thinking that the only thing missing was the assembly. And now you've added that (please note, that is x86 assembler and NOT MSIL). – Ben Voigt Sep 30 '11 at 21:31
  • @Tigran First: Haven't used .NET reflection so I'm working with my java knowledge. If you can use reflection only on objects, this avoids the problem. If we follow your argument it would mean, that every int primitive would also have to contain a pointer to the class, which is obviously not the case. Now this only works assuming that structs cannot inherit, but a quick search on the net shows this to be true, so well everything's fine (ie the compiler always knows the type of the struct at runtime) – Voo Sep 30 '11 at 21:32
  • @Voo: my point was that .NET treats struct differently from other value typed first class citizens of framework. That is. – Tigran Sep 30 '11 at 21:53
  • On Henk and Ben's comment on running it as a unit test... in response to Corey's answer below, I ran it as a Console app, just as he did... I still see results like I got as a Unit Test (which you can run or you can debug... when timing, I run them). So, I do not think that's the issue. But perhaps I need an even newer JIT Optimizer. – Brian Kennedy Sep 30 '11 at 21:55
  • @Tigran And my point is that the struct doesn't need an additional pointer to its class (and actually another variable for other stuff), because the compiler always knows what type the struct is and can autobox it if necessary - the additional information would be redundant. So a struct containing two doubles shouldn't be any larger than two single doubles (aligning issues aside that can arise). – Voo Sep 30 '11 at 22:28
  • It's interesting to note that the "same" code is slightly different. That is, `operater +` has to return a value, whereas `PlusEqual(Element)` modifies the structure directly. What is the cost of returning a value? – Jim Mischel Sep 30 '11 at 22:41
  • @Jim, it SHOULD be no cost since it's a struct AND it should be inlined... so, in both cases it should be writing those newly computed doubles directly into the destination struct's doubles... even the simplest of optimizers should handle that. And it seems from Corey's results with the latest JIT Optimizer that it does. – Brian Kennedy Sep 30 '11 at 22:51
  • By the way, this doesn't seem to be a 100% fair comparison. .NET has no += operator overload (it has to rewrite x+=y as x=x+y) so surely for the sake of fairness your PlusEqual method should return a value rather than mutating one of its arguments and returning void. – Corey Kosak Sep 30 '11 at 23:16
  • @Corey, while it may not be fair, life is not fair. ;^) The justification for not providing += operator is that the JIT optimization should make it equivalent in the end... and in many cases that is valid... and based on your results with the latest JIT Optimizer, it seems to be so. – Brian Kennedy Sep 30 '11 at 23:44

8 Answers8

9

I'm getting very different results, much less dramatic. But didn't use the test runner, I pasted the code into a console mode app. The 5% result is ~87% in 32-bit mode, ~100% in 64-bit mode when I try it.

Alignment is critical on doubles, the .NET runtime can only promise an alignment of 4 on a 32-bit machine. Looks to me the test runner is starting the test methods with a stack address that's aligned to 4 instead of 8. The misalignment penalty gets very large when the double crosses a cache line boundary.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • Why .NET can basically success on alignment of only 4 doubles? The alignment is done by using 4 byte chunks on 32 bit machine. What is a problem there? – Tigran Oct 01 '11 at 08:12
  • Why does the runtime only align to 4 bytes on x86? I think it *could* align to 64 bit if it takes additional care on when unmanaged code calls managed code. While the spec has only weak alignment guarantees, the implementations should be able to align more strictly. (Spec: "8-byte data is properly aligned when it is stored on the same boundary required by the underlying hardware for atomic access to a native int") – CodesInChaos Oct 01 '11 at 08:32
  • 1
    @Code - Well, it could, C code generators do this by doing math on the stack pointer in the function prologue. The x86 jitter just doesn't. It is *much* more important for native languages since allocating arrays on the stack is much more common and they have a heap allocator that aligns to 8 so would never want to make stack allocations less efficient than heap allocations. We're stuck with an alignment of 4 from the 32-bit gc heap. – Hans Passant Oct 01 '11 at 10:14
5

I'm having some difficulty replicating your results.

I took your code:

  • made it a standalone console application
  • built an optimized (release) build
  • increased the "size" factor from 2.5M to 10M
  • ran it from the command line (outside the IDE)

When I did so, I got the following timings which are far different from yours. For the avoidance of doubt, I'll post exactly the code I used.

Here are my timings

Populating List<Element> took 527ms.
The PlusEqual() method took 450ms.
The 'same' += operator took 386ms.
The 'same' -= operator took 446ms.
The PlusEqual(double, double) method took 413ms.
The do nothing loop took 229ms.
The ratio of operator with constructor to method is 85%.
The ratio of operator without constructor to method is 99%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 91%.
If we remove the overhead time for the loop accessing the elements from the List...
The ratio of operator with constructor to method is 71%.
The ratio of operator without constructor to method is 98%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 83%.

And these are my edits to your code:

namespace OperatorVsMethod
{
  public struct Element
  {
    public double Left;
    public double Right;

    public Element(double left, double right)
    {
      this.Left = left;
      this.Right = right;
    }    

    public static Element operator +(Element x, Element y)
    {
      return new Element(x.Left + y.Left, x.Right + y.Right);
    }

    public static Element operator -(Element x, Element y)
    {
      x.Left += y.Left;
      x.Right += y.Right;
      return x;
    }    

    /// <summary>
    /// Like the += operator; but faster.
    /// </summary>
    public void PlusEqual(Element that)
    {
      this.Left += that.Left;
      this.Right += that.Right;
    }    

    /// <summary>
    /// Like the += operator; but faster.
    /// </summary>
    public void PlusEqual(double thatLeft, double thatRight)
    {
      this.Left += thatLeft;
      this.Right += thatRight;
    }    
  }    

  public class UnitTest1
  {
    public static void Main()
    {
      Stopwatch stopwatch = new Stopwatch();

      // Populate a List of Elements to multiply together
      int seedSize = 4;
      List<double> doubles = new List<double>(seedSize);
      doubles.Add(2.5d);
      doubles.Add(100000d);
      doubles.Add(-0.5d);
      doubles.Add(-100002d);

      int size = 10000000 * seedSize;
      List<Element> elts = new List<Element>(size);

      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        int di = ii % seedSize;
        double d = doubles[di];
        elts.Add(new Element(d, d));
      }
      stopwatch.Stop();
      long populateMS = stopwatch.ElapsedMilliseconds;

      // Measure speed of += operator (calls ctor)
      Element operatorCtorResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        operatorCtorResult += elts[ii];
      }
      stopwatch.Stop();
      long operatorCtorMS = stopwatch.ElapsedMilliseconds;

      // Measure speed of -= operator (+= without ctor)
      Element operatorNoCtorResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        operatorNoCtorResult -= elts[ii];
      }
      stopwatch.Stop();
      long operatorNoCtorMS = stopwatch.ElapsedMilliseconds;

      // Measure speed of PlusEqual(Element) method
      Element plusEqualResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        plusEqualResult.PlusEqual(elts[ii]);
      }
      stopwatch.Stop();
      long plusEqualMS = stopwatch.ElapsedMilliseconds;

      // Measure speed of PlusEqual(double, double) method
      Element plusEqualDDResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        Element elt = elts[ii];
        plusEqualDDResult.PlusEqual(elt.Left, elt.Right);
      }
      stopwatch.Stop();
      long plusEqualDDMS = stopwatch.ElapsedMilliseconds;

      // Measure speed of doing nothing but accessing the Element
      Element doNothingResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        Element elt = elts[ii];
        double left = elt.Left;
        double right = elt.Right;
      }
      stopwatch.Stop();
      long doNothingMS = stopwatch.ElapsedMilliseconds;

      // Report speeds
      Console.WriteLine("Populating List<Element> took {0}ms.", populateMS);
      Console.WriteLine("The PlusEqual() method took {0}ms.", plusEqualMS);
      Console.WriteLine("The 'same' += operator took {0}ms.", operatorCtorMS);
      Console.WriteLine("The 'same' -= operator took {0}ms.", operatorNoCtorMS);
      Console.WriteLine("The PlusEqual(double, double) method took {0}ms.", plusEqualDDMS);
      Console.WriteLine("The do nothing loop took {0}ms.", doNothingMS);

      // Compare speeds
      long percentageRatio = 100L * operatorCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * plusEqualDDMS / plusEqualMS;
      Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);

      operatorCtorMS -= doNothingMS;
      operatorNoCtorMS -= doNothingMS;
      plusEqualMS -= doNothingMS;
      plusEqualDDMS -= doNothingMS;
      Console.WriteLine("If we remove the overhead time for the loop accessing the elements from the List...");
      percentageRatio = 100L * operatorCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * plusEqualDDMS / plusEqualMS;
      Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);
    }
  }
}
abatishchev
  • 98,240
  • 88
  • 296
  • 433
Corey Kosak
  • 2,615
  • 17
  • 13
  • I just did the same, my results are more like yours. Please state platform and CPu type. – H H Sep 30 '11 at 21:20
  • Very interesting! I've had others verify my results... you're the first to get different. First question for you: what is the version number of the file I mention in my post... C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorwks.dll ... that's the one the Microsoft documents said indicated the version of JIT Optimizer you have. (If I can just tell my users to upgrade their .NET to see big speedups, I'll be a happy camper. But I'm guessing its not gonna be that simple.) – Brian Kennedy Sep 30 '11 at 21:25
  • I was running inside Visual Studio... running on Windows XP SP3... in a VMware virtual machine... on a 2.7GHz Intel Core i7. But its not the absolute times that interest me... it is the ratios... I would expect those three methods to all perform similarly, which they did for Corey, but do NOT for me. – Brian Kennedy Sep 30 '11 at 21:27
  • My project properties say: Configuration: Release; Platform: Active (x86); Platform target: x86 – Corey Kosak Sep 30 '11 at 21:30
  • 1
    Regarding your request to get the version of mscorwks... Sorry, did you want me to run this thing against .NET 2.0? My tests were on .NET 4.0 – Corey Kosak Sep 30 '11 at 21:38
  • I just took Corey's code and ran it just as he did and verified that I still get results like mine rather than like his. (227ms for PlusEqual, 340ms for +=, 401ms for -=, and 185ms for PlusEqual(double, double).) – Brian Kennedy Sep 30 '11 at 21:43
  • If you have .NET 4.0, then you also have .NET 2.0... .NET 4.0 contains .NET 3.5, 3.0, and 2.0 as subsets of itself... .NET 4.0 doesn't replace .NET 2.0, just adds to it. The JIT and CLR obviously existed with 2.0, and thus are still in the 2.0 directories... but are updated to the latest versions (given you have 4.0). (So, no, you don't need to run it against .NET 2.0... I'm curious the version of that file on your system.) – Brian Kennedy Sep 30 '11 at 21:48
  • "Properties" from Windows Explorer says that file's Product version is 2.0.50727.5446 – Corey Kosak Sep 30 '11 at 21:52
  • Interesting... mine is .3620 ... so, perhaps the Microsoft documentation is wrong about when they changed the JIT Optimizations (supposedly > 3053 should be optimizing structs right)... I need to do some updates and re-testing to see if that's the issue. THANKS! – Brian Kennedy Sep 30 '11 at 22:04
  • Corey, your results are the only ones, so far, where the operators have performed comparably to the methods. – Brian Kennedy Sep 30 '11 at 22:19
  • I'd be happy to run other tests, or compile a specific VS solution if you want to remove all doubt about some project setting I may have set differently or whatever. – Corey Kosak Sep 30 '11 at 22:25
  • e8400 4gb RAM, Win7 x64 Professional, Visual Studio 2010 Ultimate, running release version of this code in x86 mode, same product version as corey returns basically the same result as the one's he posted (though quite some derivation between runs; about +-10% I'd say, so shouldn't matter). So yes it seems to be that the old JIT just doesn't optimize as well. – Voo Sep 30 '11 at 22:33
3

Running .NET 4.0 here. I compiled with "Any CPU", targeting .NET 4.0 in release mode. Execution was from the command line. It ran in 64-bit mode. My timings are a bit different.

Populating List<Element> took 442ms.
The PlusEqual() method took 115ms.
The 'same' += operator took 201ms.
The 'same' -= operator took 200ms.
The PlusEqual(double, double) method took 129ms.
The do nothing loop took 93ms.
The ratio of operator with constructor to method is 174%.
The ratio of operator without constructor to method is 173%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 112%.
If we remove the overhead time for the loop accessing the elements from the List
...
The ratio of operator with constructor to method is 490%.
The ratio of operator without constructor to method is 486%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 163%.

In particular, PlusEqual(Element) is slightly faster than PlusEqual(double, double).

Whatever the problem is in .NET 3.5, it doesn't appear to exist in .NET 4.0.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • 2
    Yes, the answer on Structs appears to be "get the newer JIT". But as I asked on Henk's answer, why are methods so much faster than Operators? Both your methods are 5x faster than either of your operators... which are doing exactly the same thing. It is great that I can use structs again... but sad that I still have to avoid operators. – Brian Kennedy Sep 30 '11 at 22:14
  • Jim, I'd be very interested to know the version of the file C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorwks.dll on your system... if newer than mine (.3620), but older than Corey's (.5446), then that might explain why your operators are still slow like mine, but Corey's aren't. – Brian Kennedy Sep 30 '11 at 22:22
  • @Brian: File version 2.0.50727.4214. – Jim Mischel Sep 30 '11 at 22:30
  • THANKS! So, I need to make sure my users have 4214 or later to get struct optimizations and 5446 or later to get operator optimization. I need to add some code to check that at startup and give some warnings. Thanks again. – Brian Kennedy Sep 30 '11 at 22:44
2

In addition to JIT compiler differences mentioned in other answers, another difference between a struct method call and a struct operator is that a struct method call will pass this as a ref parameter (and may be written to accept other parameters as ref parameters as well), while a struct operator will pass all operands by value. The cost to pass a structure of any size as a ref parameter is fixed, no matter how large the structure is, while the cost to pass larger structures is proportional to structure size. There is nothing wrong with using large structures (even hundreds of bytes) if one can avoid copying them unnecessarily; while unnecessary copies can often be prevented when using methods, they cannot be prevented when using operators.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • Hmmm... well, that could explain a lot! So, if the operator is short enough that it will be inlined, I assume it won't make unnecessary copies. But if not, and your struct is more than one word, you may not want to implement it as an operator if speed is critical. Thanks for that insight. – Brian Kennedy Feb 21 '15 at 20:23
  • BTW, one thing that annoys me slightly when questions about speed are answered "benchmark it!" is that such a response ignores the fact that in many cases what matters is whether an operation usually takes 10us or 20us, but whether a slight change of circumstances could cause it to take 1ms or 10ms. What matters is not how fast something runs on a developer's machine, but rather whether the operation will ever be *slow enough to matter*; if method X runs twice as fast as method Y on most machines, but on some machines it will be 100 times as slow, method Y may be the better choice. – supercat Feb 21 '15 at 20:46
  • Of course, here we're talking about just 2 doubles... not large structs. Passing two doubles on the stack where they can be quickly accessed isn't necessarily slower than passing 'this' on the stack and then having to dereference that to pull them in to operate on them.. but it could cause differences. However, in this case, it should be inlined, so the JIT Optimizer should end up with exactly the same code. – Brian Kennedy Feb 21 '15 at 20:46
2

Like @Corey Kosak, I just ran this code in VS 2010 Express as a simple Console App in Release mode. I get very different numbers. But I also have Fx4.5 so these might not be the results for a clean Fx4.0 .

Populating List<Element> took 435ms.
The PlusEqual() method took 109ms.
The 'same' += operator took 217ms.
The 'same' -= operator took 157ms.
The PlusEqual(double, double) method took 118ms.
The do nothing loop took 79ms.
The ratio of operator with constructor to method is 199%.
The ratio of operator without constructor to method is 144%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 108%.
If we remove the overhead time for the loop accessing the elements from the List
...
The ratio of operator with constructor to method is 460%.
The ratio of operator without constructor to method is 260%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 130%.

Edit: and now run from the cmd line. That does make a difference, and less variation in the numbers.

H H
  • 263,252
  • 30
  • 330
  • 514
  • Yes, it appears the later JIT has fixed the struct issue, but my question on why methods are so much faster than operators remains. Look how much faster both PlusEqual methods are than the equivalent += operator. And its also interesting how much faster -= is than +=... your timings are the first where I have seen that. – Brian Kennedy Sep 30 '11 at 22:10
  • Henk, I'd be very interested to know the version of the file C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorwks.dll on your system... if newer than mine (.3620), but older than Corey's (.5446), then that might explain why your operators are still slow like mine, but Corey's aren't. – Brian Kennedy Sep 30 '11 at 22:23
  • 1
    I can only find the .50727 version but I'm not sure if that's relevant for Fx40/Fx45 ? – H H Sep 30 '11 at 22:37
  • You have to go into Properties and click on the Version tab to see the rest of the version number. – Brian Kennedy Sep 30 '11 at 22:46
1

Not sure if this is relevant, but here's the numbers for .NET 4.0 64-bit on Windows 7 64-bit. My mscorwks.dll version is 2.0.50727.5446. I just pasted the code into LINQPad and ran it from there. Here's the result:

Populating List<Element> took 496ms.
The PlusEqual() method took 189ms.
The 'same' += operator took 295ms.
The 'same' -= operator took 358ms.
The PlusEqual(double, double) method took 148ms.
The do nothing loop took 103ms.
The ratio of operator with constructor to method is 156%.
The ratio of operator without constructor to method is 189%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 78%.
If we remove the overhead time for the loop accessing the elements from the List
...
The ratio of operator with constructor to method is 223%.
The ratio of operator without constructor to method is 296%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 52%.
Daniel Pryden
  • 59,486
  • 16
  • 97
  • 135
  • 2
    Interesting... it would appear that the optimizations that were added to the 32b JIT Optimizer have not yet made it to the 64b JIT Optimizer... your ratios are still very similar to mine. Disappointing... but good to know. – Brian Kennedy Oct 01 '11 at 01:35
0

May be instead of List you should use double[] with "well known" offsets and index increments?

Konstantin Isaev
  • 642
  • 8
  • 14
0

I would imagine as when you are accessing members of the struct, that it is infact doing an extra operation to access the member, the THIS pointer + offset.

Matthew
  • 24,703
  • 9
  • 76
  • 110
  • 1
    Well, with a class object, you would absolutely be right... because the method would just be passed the 'this' pointer. However, with structs, that shouldn't be so. The struct should be passed into the methods on the stack. So, the first double should be sitting where the 'this' pointer would be and the second double in the position right after it... both possibly being registers in the CPU. So, the JIT should just be using an offset at most. – Brian Kennedy Sep 30 '11 at 21:09