My questions: I'd like to know three things. First, in .NET Core, are "foreach" loops optimized to the same code as "for" loops and is that specific to .NET Core? Looking at the compiled code with ILSpy suggests "foreach" is treated like a for loop.
Secondly I'd like some to know if my tests, using BenchmarkDotNet, are flawed or if it's simply hard to get consistent results using .NET on a desktop. The results had so much variation that I'm not sure that they are all that useful.
Finally I'd like to see if someone could confirm that the best method for iterating over value type arrays, especially arrays of large structs, is using a foreach with a ref iterator variable. Like this:
var data = this.Data.AsSpan();
foreach (ref var d in data)
{
if (d.Equals(Check))
++count;
}
Some more detail: I'm optimizing some code that needs to iterate over an array of structs. I recalled that I've seen guidance before that a "for" loop is somewhat faster than "foreach". But that guidance is pretty old as seen here. That SO answer is still the first one that comes up but it's 11 years old.
I also noticed this SO answer which is more recent and looks at the new features of .NET core - span and ref variables. But the results where surprising in that "for loops" seemed to be slower.
Given that the results from the 2nd answer seemed to contradict the old guidance I decided to test for myself.
What I thought would be the fastest way to iterate over an array of value types - using pointers - turned out to be the fastest most of the time but not always. And using foreach with a by ref iterator variable turned out to be identical to the pointers in most cases.
But the results where not very consistent. I used 10,000 elements for the runs I'm showing. I also used 500,000. The results where still inconsistent with the larger data sizes.
The Code: I skipped the custom enumerator used in the 2nd answer and instead added a while loop using pointers. That looks like this:
[Benchmark]
public int RefInc()
{
var data = this.Data.AsSpan();
ref T current = ref MemoryMarshal.GetReference(data);
int length = data.Length;
int count = 0;
while (length > 0)
{
if (current.Equals(Check))
++count;
--length;
current = ref Unsafe.Add(ref current, 1);
}
return count;
}
I thought in theory that this would be the fastest possible way to iterate over an array of value types. Please note I'm in no way suggesting using this type of loop as for and foreach are much more clear. I just wanted a reference point to compare the others against again thinking this would be the fastest I could get.
I'll also include the simple "foreach" benchmark as well as what that method looked like using ILSpy.
[Benchmark]
public int ForEach()
{
int count = 0;
foreach (var d in this.Data)
{
if (d.Equals(Check))
++count;
}
return count;
}
With ILSpy:
[Benchmark]
public int ForEach()
{
int count = 0;
T[] data = Data;
for (int i = 0; i < data.Length; i++)
{
T d = data[i];
if (d.Equals(Check))
{
count++;
}
}
return count;
}
The entire set of benchmarks where contained in this class:
public class ArrayEnumerationBenchMark<T> where T : IEquatable<T>
{
readonly T Check = default(T);
T[] Data;
[GlobalSetup]
public void Setup()
{
this.Data = new T[DataSize];
}
[Params(10000)]
public int DataSize;
[Benchmark]
public int RefInc()
{
var data = this.Data.AsSpan();
ref T current = ref MemoryMarshal.GetReference(data);
int length = data.Length;
int count = 0;
while (length > 0)
{
if (current.Equals(Check))
++count;
--length;
current = ref Unsafe.Add(ref current, 1);
}
return count;
}
[Benchmark]
public int ForEach()
{
int count = 0;
foreach (var d in this.Data)
{
if (d.Equals(Check))
++count;
}
return count;
}
[Benchmark]
public int ForEachSpan()
{
int count = 0;
var data = this.Data.AsSpan();
foreach (var d in data)
{
if (d.Equals(Check))
++count;
}
return count;
}
[Benchmark]
public int ForEachSpanRef()
{
int count = 0;
var data = this.Data.AsSpan();
foreach (ref var d in data)
{
if (d.Equals(Check))
++count;
}
return count;
}
[Benchmark]
public int For()
{
int count = 0;
T[] data = this.Data;
for (int i = 0; i < data.Length; ++i)
{
T d = data[i];
if (d.Equals(Check))
++count;
}
return count;
}
[Benchmark]
public int ForSpan()
{
int count = 0;
var data = this.Data.AsSpan();
for (int i = 0; i < data.Length; ++i)
{
T d = data[i];
if (d.Equals(Check))
++count;
}
return count;
}
[Benchmark]
public int ForRefSpan()
{
var data = this.Data.AsSpan();
int count = 0;
for (int i = 0; i < data.Length; ++i)
{
ref readonly T d = ref data[i];
if (d.Equals(Check))
++count;
}
return count;
}
[Benchmark]
public int ForRef()
{
int count = 0;
for (int i = 0; i < this.Data.Length; ++i)
{
ref readonly T d = ref this.Data[i];
if (d.Equals(Check))
++count;
}
return count;
}
}
The Benchmark class is generic and takes any IEquatable struct. I used the Color struct from the 2nd question and added some larger structs as well to see if size of the struct changed the results. The Color struct and DoubleColor are included below with the LargeStruct. I'll skip the TripleColor and QuadColor variations since this is already getting so long.
public struct Color : IEquatable<Color>
{
public float R;
public float G;
public float B;
public float A;
public bool Equals([AllowNull] Color other) => this.R == other.R;
}
public struct DoubleColor : IEquatable<DoubleColor>
{
public Color First;
public Color Second;
public bool Equals([AllowNull] DoubleColor other)
{
return this.First.R == other.First.R;
}
}
public struct LargeStruct : IEquatable<LargeStruct>
{
public Color First;
public Guid G0;
public Guid G1;
public Guid G2;
public Guid G3;
public Guid G4;
public Guid G5;
public Guid G6;
public Guid G7;
public Guid G8;
public Guid G9;
public bool Equals([AllowNull] LargeStruct other)
{
return this.First.R == other.First.R;
}
}
I then executed this using the BenchmarkDotNet runner like this:
static void Main(string[] args)
{
var config = ManualConfig.Create(DefaultConfig.Instance)
.WithOption(ConfigOptions.KeepBenchmarkFiles, true);
BenchmarkRunner.Run<ArrayEnumerationBenchMark<byte>>(config);
BenchmarkRunner.Run<ArrayEnumerationBenchMark<short>>(config);
BenchmarkRunner.Run<ArrayEnumerationBenchMark<double>>(config);
BenchmarkRunner.Run<ArrayEnumerationBenchMark<decimal>>(config);
BenchmarkRunner.Run<ArrayEnumerationBenchMark<Color>>(config);
BenchmarkRunner.Run<ArrayEnumerationBenchMark<DoubleColor>>(config);
BenchmarkRunner.Run<ArrayEnumerationBenchMark<TripleColor>>(config);
BenchmarkRunner.Run<ArrayEnumerationBenchMark<QuadColor>>(config);
BenchmarkRunner.Run<ArrayEnumerationBenchMark<LargeStruct>>(config);
}
The Results: I'll just show some of the results the demonstrate my concerns about the results. I'll add the type above each block of results. Note BenchmarkDotNet dropped the Median from the summary at times and I don't know why that happened. So ignore that column.
Byte
| Method | DataSize | Mean | Error | StdDev |
|--------------- |--------- |---------:|----------:|----------:|
| RefInc | 10000 | 4.587 μs | 0.0573 μs | 0.0479 μs |
| ForEach | 10000 | 9.765 μs | 0.1775 μs | 0.2308 μs |
| ForEachSpan | 10000 | 4.879 μs | 0.0971 μs | 0.2027 μs |
| ForEachSpanRef | 10000 | 4.767 μs | 0.0950 μs | 0.0889 μs |
| For | 10000 | 9.871 μs | 0.1937 μs | 0.2306 μs |
| ForSpan | 10000 | 4.756 μs | 0.0680 μs | 0.0568 μs |
| ForRefSpan | 10000 | 4.798 μs | 0.0719 μs | 0.0600 μs |
| ForRef | 10000 | 7.303 μs | 0.0795 μs | 0.0620 μs |
double
| Method | DataSize | Mean | Error | StdDev |
|--------------- |--------- |----------:|----------:|----------:|
| RefInc | 10000 | 11.649 μs | 0.2311 μs | 0.5168 μs |
| ForEach | 10000 | 7.441 μs | 0.1470 μs | 0.2867 μs |
| ForEachSpan | 10000 | 6.865 μs | 0.1354 μs | 0.2705 μs |
| ForEachSpanRef | 10000 | 7.872 μs | 0.1564 μs | 0.2570 μs |
| For | 10000 | 7.347 μs | 0.1178 μs | 0.0920 μs |
| ForSpan | 10000 | 6.895 μs | 0.1376 μs | 0.2143 μs |
| ForRefSpan | 10000 | 6.890 μs | 0.1371 μs | 0.3540 μs |
| ForRef | 10000 | 8.861 μs | 0.1755 μs | 0.3208 μs |
Color
| Method | DataSize | Mean | Error | StdDev | Median |
|--------------- |--------- |----------:|----------:|----------:|----------:|
| RefInc | 10000 | 8.570 μs | 0.0864 μs | 0.0674 μs | 8.588 μs |
| ForEach | 10000 | 9.824 μs | 0.1097 μs | 0.0857 μs | 9.832 μs |
| ForEachSpan | 10000 | 9.939 μs | 0.1979 μs | 0.1851 μs | 9.899 μs |
| ForEachSpanRef | 10000 | 14.416 μs | 0.2858 μs | 0.6214 μs | 14.096 μs |
| For | 10000 | 9.846 μs | 0.1415 μs | 0.1323 μs | 9.797 μs |
| ForSpan | 10000 | 11.860 μs | 0.4646 μs | 1.3552 μs | 12.143 μs |
| ForRefSpan | 10000 | 13.926 μs | 0.0906 μs | 0.0708 μs | 13.920 μs |
| ForRef | 10000 | 17.225 μs | 0.3361 μs | 0.6061 μs | 16.950 μs |
LargeStruct
| Method | DataSize | Mean | Error | StdDev | Median |
|--------------- |--------- |---------:|--------:|---------:|---------:|
| RefInc | 10000 | 127.2 μs | 1.86 μs | 1.55 μs | 127.3 μs |
| ForEach | 10000 | 252.4 μs | 4.76 μs | 4.89 μs | 251.9 μs |
| ForEachSpan | 10000 | 253.5 μs | 4.97 μs | 8.31 μs | 250.4 μs |
| ForEachSpanRef | 10000 | 129.0 μs | 2.54 μs | 4.89 μs | 127.1 μs |
| For | 10000 | 248.5 μs | 4.68 μs | 4.15 μs | 247.6 μs |
| ForSpan | 10000 | 251.3 μs | 4.48 μs | 3.97 μs | 250.5 μs |
| ForRefSpan | 10000 | 252.0 μs | 3.48 μs | 2.90 μs | 251.2 μs |
| ForRef | 10000 | 258.9 μs | 5.10 μs | 12.70 μs | 253.5 μs |