2

I have an immutable struct I'm using for a 3D vector. I am aware that basic getter properties are (supposed to be) inlined and therefore should perform identical to fields assuming you are in release configuration and running outside of VS without any influence from debugging or JIT suppression. However, that's not the behavior I'm seeing and I'm trying to figure out why. (By the way, I've read every other post about this on SO that I can find to no avail).

Setup: VS2019 v16.8.4, using .NET 5.0 & C#9. Release configuration, with Optimization enabled in all projects.

First, the relevant piece of my vector class using public fields (for comparison):

[Serializable]
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public readonly struct Vector3
{
  public readonly double x;
  public readonly double y;
  public readonly double z;

  ...
}

and the Console app code to test the time to access a field from 1 million vectors. (I know I can do things to make the benchmark code better, such as increasing thread priority or warming up the function, but the key here is the difference between the performance I see when using fields vs. properties--which is consistent):

static void Main(string[] args)
{
  const int sampleSize = 1000000;
  var vectors = new Vector3[sampleSize];

  // Fill with random values.
  var rand = new Random();
  for (var i = 0; i < sampleSize; i++)
  {
    vectors[i] = new Vector3(rand.NextDouble(),
                             rand.NextDouble(),
                             rand.NextDouble());
  }

  double val;
  var sw = Stopwatch.StartNew();
  
  for (var i = 0; i < sampleSize; i++)
    val = vectors[i].x; //Access the field as a test.
  
  sw.Stop();
  Console.WriteLine($"Accessing the fields 1M times took {sw.ElapsedTicks} ticks.");

When I build this code in release configuration, start the command prompt, and run the executable, it takes around 3100 to 3400 ticks to access the x field a million times.

If I change this code to use a simple getter instead:

[Serializable]
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public readonly struct Vector3
{
  public double X { get; }
  public double Y { get; }
  public double Z { get; }
  ...
}

and the only change to Main() is this:

val = vectors[i].X; //Now access via a property instead of the field.

(that is, accessing via the auto-prop instead of the field) then when I build this in release config and run the executable several times in the command prompt, I see times in the 15,000 tick range, or about 5 times slower!

Again, I have confirmed for both tests that I am compiling in release configuration, with optimization checked in all of my projects, and I am running the executable in the command prompt outside of Studio (so the debugger can't possibly be attached).

What am I missing? Why is the property access taking 5 times longer?

Todd Burch
  • 200
  • 8
  • So i am interested in where you got this nugget of information "*I am aware that basic getter properties are (supposed to be)*" – TheGeneral Jan 27 '21 at 02:19
  • Jon Skeet (https://stackoverflow.com/questions/646779/does-c-sharp-inline-properties) mentions that simple properties are typically inlined. Although interestingly he does mention that with doubles sometimes the properties are slower. (This post is over 10 years old though, so not sure if it would still apply). – Todd Burch Jan 27 '21 at 02:25
  • To add, all the search results I find indicate that simple properties typically get inlined. A 5x difference is significant! – Todd Burch Jan 27 '21 at 02:28
  • Just found that the MSDN docs also say this (see https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/using-properties): "For example, when you are returning the private variable from the get accessor and optimizations are enabled, the call to the get accessor method is inlined by the compiler so there is no method-call overhead." – Todd Burch Jan 27 '21 at 02:29
  • I am not seeing your results with benchmarkDotNet – TheGeneral Jan 27 '21 at 02:32
  • @00110001 that's what I'm so confused about. I am not seeing what's causing this behavior. – Todd Burch Jan 27 '21 at 02:36
  • Run the benchmark code I have posted – TheGeneral Jan 27 '21 at 02:39

2 Answers2

3

I am not seeing your results

The results are all within a typical margin or error given my CPU and environment

Note, you should always use a benchmarking tool for these sort of tests, there are lot of things that go wrong otherwise. In this case I am using BenchmarkDotnet

Config

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.746 (2004/?/20H1)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.102
  [Host]        : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
  .NET Core 5.0 : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT

Job=.NET Core 5.0  Runtime=.NET Core 5.0

Results

Method Mean Error StdDev
Field 1.535 ms 0.0307 ms 0.0605 ms
Prop 1.512 ms 0.0204 ms 0.0171 ms
PropInline 1.567 ms 0.0295 ms 0.0404 ms

Given

[Serializable]
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public readonly struct Struct1
{
   public readonly double x;
   public readonly double y;
   public readonly double z;

   public Struct1(double x, double y, double z)
   {
      this.x = x;
      this.y = y;
      this.z = z;
   }
}

[Serializable]
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public readonly struct Struct2
{
   public  double x { get; }
   public  double y{ get; }
   public  double z{ get; }

   public Struct2(double x, double y, double z)
   {
      this.x = x;
      this.y = y;
      this.z = z;
   }
}

[Serializable]
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public readonly struct Struct3
{
   public double x
   {
      [MethodImpl(MethodImplOptions.AggressiveInlining)]
      get;
   }

   public double y
   {
      [MethodImpl(MethodImplOptions.AggressiveInlining)]
      get;
   }

   public double z
   {
      [MethodImpl(MethodImplOptions.AggressiveInlining)]
      get;
   }

   public Struct3(double x, double y, double z)
   {
      this.x = x;
      this.y = y;
      this.z = z;
   }
}

Test Code

[SimpleJob(RuntimeMoniker.NetCoreApp50 )]

public class Test
{

   private Struct1[] _vector1s;
   private Struct2[] _vector2s;
   private Struct3[] _vector3s;

   [GlobalSetup]
   public void Setup()
   {
      const int sampleSize = 1000000;
      _vector1s = new Struct1[sampleSize];
      _vector2s = new Struct2[sampleSize];
      _vector3s = new Struct3[sampleSize];
      // Fill with random values.
      var rand = new Random();
      for (var i = 0; i < sampleSize; i++)
      {
         var x = rand.NextDouble();
         var y = rand.NextDouble();
         var z = rand.NextDouble();

         _vector1s[i] = new Struct1(x,y,z);
         _vector2s[i] = new Struct2(x,y,z);
         _vector3s[i] = new Struct3(x,y,z);

      }
   }

   [Benchmark]
   public double Field()
   {
      double val =0;

      unchecked
      {
         for (var i = 0; i < 1000000; i++)
            val += _vector1s[i].x; 
      }

      return val;
   }
   [Benchmark]
   public double Prop()
   {
      double val =0;

      unchecked
      {
         for (var i = 0; i < 1000000; i++)
            val += _vector2s[i].x; 
      }

      return val;
   }
   [Benchmark]
   public double PropInline()
   {
      double val =0;

      unchecked
      {
         for (var i = 0; i < 1000000; i++)
            val += _vector3s[i].x; 
      }

      return val;
   }

}
TheGeneral
  • 79,002
  • 9
  • 103
  • 141
  • Thanks so much for taking the time to test and helping to confirm that I'm losing my mind :). Now I just need to find out why I don't see this. I'll try what you've posted (in a new solution). – Todd Burch Jan 27 '21 at 02:40
  • 1
    His example uses `+=` yours does not. When accessing a field, perhaps the optimiser has noticed you are discarding the result. But it's more difficult to prove that a property getter has no side effects. – Jeremy Lakeman Jan 27 '21 at 02:45
  • @JeremyLakeman i am using `+=`, but yes that was a conscious decision. Just benchmarking with `=` to compare – TheGeneral Jan 27 '21 at 02:47
  • @JeremyLakeman just benchmarked with the Ops actual code, no change really. I guess there might be differences in framework or CPU architecture, though i don't have access to an AMD – TheGeneral Jan 27 '21 at 02:52
  • Also if everything is in one method, you may be stuck executing first pass JIT code rather than more heavily re-optimised code. In other words, I don't believe .net 5 can do “On Stack Replacement” yet. – Jeremy Lakeman Jan 27 '21 at 02:54
  • @JeremyLakeman By default `BenchmarkDotNet`, warms up and pre jits, gcs, as well as a multitude passes. I am not saying this benchmark is flawless/faultless , but, it would seem to allay those concerns – TheGeneral Jan 27 '21 at 02:59
  • @00110001 I ran the benchmark and all 3 versions of the classes you wrote above came in around the same average time on my box (1.7 ms)...so I guess that indicates they all do in fact have the same performance. Now I need to figure out why my test using a compiled release build showed such a big difference. Not only is it consistent, but even if I add some warmup code, the prop accessor is still ~5x slower in my compiled exe. – Todd Burch Jan 27 '21 at 03:10
  • If I switch my original code to use += to accumulate the values, it doesn't affect the result. – Todd Burch Jan 27 '21 at 03:13
  • @JeremyLakeman that's an interesting theory that "On Stack Replacement" isn't possible. I made the array a static so it would be allocated on the heap, then tried running the loop both in Main() and in a method called from Main(). No difference. Also tried the AggressiveInlining attribute on my getters. Didn't make a difference. Worthy of note I'm on a 64-bit Intel i7, so same as 00110001. – Todd Burch Jan 27 '21 at 03:26
  • 1
    @ToddBurch Running the code above on a Ryzen 3800X (all-core OC to 4.4) I get expected results. Mean for both Prop runs is 1.075 ms, and Field is 1.052 ms. Well within expected variance. Just trying to help eliminate CPU architecture. Config is the same, minus the core difference. – Zer0 Jan 27 '21 at 04:16
  • @Zer0 great thanks, yeah I wouldn't have expected a difference, but you never know your luck in the big city – TheGeneral Jan 27 '21 at 04:17
  • "On Stack Replacement" is about leaving the stack alone, while hot-swapping the code that is executing, eg from early JIT to "more optimised". You could think about it as patching the compiled loop body, so it jumps to a trampoline. This might load values from the stack into registers (like `i`, `val`, `&_vector2s[i]`...), then jump directly into the more optimised loop body. – Jeremy Lakeman Jan 27 '21 at 04:50
  • On-stack replacement seems to be the most likely cause for this. Basically, when Main() runs it’s instantly a hot method and the Jitter doesn’t get the chance to swap the inlined version in for the original. I will try CrossGen later today and see if that resolves the difference. – Todd Burch Jan 27 '21 at 11:01
  • Well, I tried finding CrossGen (no luck) and then tried Ready2Run (https://learn.microsoft.com/en-us/dotnet/core/deploying/ready-to-run) with a publish in an attempt to AOT compile the code to perhaps see if on-stack replacement was the cause. The result was that the property accessor was even slower than just running the JIT'd release exe. Perhaps I'm just not doing R2R/crossgen right (I'm coming from .NET Framework so this is new to me). – Todd Burch Jan 28 '21 at 01:57
0

The property access might be taking longer because the property access is not inlined. You can play around with your code at SharpLab.io - where you can see that no inlining happens.

Also, make sure you do not measure the first run of your executable. Upon first execution there is no jitted code, hence no inlined code. Do you see the same numbers on each run?

Sure, the compile may inline your properties. In your code base - in some specific context - the compiler perhaps decides not to inline your properties.

l33t
  • 18,692
  • 16
  • 103
  • 180