In .NET Core is for and foreach now the same when iterating over an array of valuetypes?

Question

My questions: I'd like to know three things. First, in .NET Core, are "foreach" loops optimized to the same code as "for" loops and is that specific to .NET Core? Looking at the compiled code with ILSpy suggests "foreach" is treated like a for loop.

Secondly I'd like some to know if my tests, using BenchmarkDotNet, are flawed or if it's simply hard to get consistent results using .NET on a desktop. The results had so much variation that I'm not sure that they are all that useful.

Finally I'd like to see if someone could confirm that the best method for iterating over value type arrays, especially arrays of large structs, is using a foreach with a ref iterator variable. Like this:

        var data = this.Data.AsSpan();
        foreach (ref var d in data)
        {
            if (d.Equals(Check))
                ++count;
        }

Some more detail: I'm optimizing some code that needs to iterate over an array of structs. I recalled that I've seen guidance before that a "for" loop is somewhat faster than "foreach". But that guidance is pretty old as seen here. That SO answer is still the first one that comes up but it's 11 years old.

I also noticed this SO answer which is more recent and looks at the new features of .NET core - span and ref variables. But the results where surprising in that "for loops" seemed to be slower.

Given that the results from the 2nd answer seemed to contradict the old guidance I decided to test for myself.

What I thought would be the fastest way to iterate over an array of value types - using pointers - turned out to be the fastest most of the time but not always. And using foreach with a by ref iterator variable turned out to be identical to the pointers in most cases.

But the results where not very consistent. I used 10,000 elements for the runs I'm showing. I also used 500,000. The results where still inconsistent with the larger data sizes.

The Code: I skipped the custom enumerator used in the 2nd answer and instead added a while loop using pointers. That looks like this:

    [Benchmark]
    public int RefInc()
    {
        var data = this.Data.AsSpan();
        ref T current = ref MemoryMarshal.GetReference(data);
        int length = data.Length;
        int count = 0;

        while (length > 0)
        {
            if (current.Equals(Check))
                ++count;

            --length;
            current = ref Unsafe.Add(ref current, 1);
        }

        return count;
    }

I thought in theory that this would be the fastest possible way to iterate over an array of value types. Please note I'm in no way suggesting using this type of loop as for and foreach are much more clear. I just wanted a reference point to compare the others against again thinking this would be the fastest I could get.

I'll also include the simple "foreach" benchmark as well as what that method looked like using ILSpy.

    [Benchmark]
    public int ForEach()
    {
        int count = 0;
        foreach (var d in this.Data)
        {
            if (d.Equals(Check))
                ++count;
        }
        return count;
    }

With ILSpy:

[Benchmark]
public int ForEach()
{
    int count = 0;
    T[] data = Data;
    for (int i = 0; i < data.Length; i++)
    {
        T d = data[i];
        if (d.Equals(Check))
        {
            count++;
        }
    }
    return count;
}

The entire set of benchmarks where contained in this class:

public class ArrayEnumerationBenchMark<T> where T : IEquatable<T>
{
    readonly T Check = default(T);
    T[] Data;

    [GlobalSetup]
    public void Setup()
    {
        this.Data = new T[DataSize];
    }

    [Params(10000)]
    public int DataSize;


    [Benchmark]
    public int RefInc()
    {
        var data = this.Data.AsSpan();
        ref T current = ref MemoryMarshal.GetReference(data);
        int length = data.Length;
        int count = 0;

        while (length > 0)
        {
            if (current.Equals(Check))
                ++count;

            --length;
            current = ref Unsafe.Add(ref current, 1);
        }

        return count;
    }

    [Benchmark]
    public int ForEach()
    {
        int count = 0;
        foreach (var d in this.Data)
        {
            if (d.Equals(Check))
                ++count;
        }
        return count;
    }

    [Benchmark]
    public int ForEachSpan()
    {
        int count = 0;
        var data = this.Data.AsSpan();
        foreach (var d in data)
        {
            if (d.Equals(Check))
                ++count;
        }
        return count;
    }

    [Benchmark]
    public int ForEachSpanRef()
    {
        int count = 0;
        var data = this.Data.AsSpan();
        foreach (ref var d in data)
        {
            if (d.Equals(Check))
                ++count;
        }
        return count;
    }

    [Benchmark]
    public int For()
    {
        int count = 0;
        T[] data = this.Data;
        for (int i = 0; i < data.Length; ++i)
        {
            T d = data[i];

            if (d.Equals(Check))
                ++count;
        }
        return count;
    }

    [Benchmark]
    public int ForSpan()
    {
        int count = 0;
        var data = this.Data.AsSpan();
        for (int i = 0; i < data.Length; ++i)
        {
            T d = data[i];

            if (d.Equals(Check))
                ++count;
        }
        return count;
    }

    [Benchmark]
    public int ForRefSpan()
    {
        var data = this.Data.AsSpan();
        int count = 0;
        for (int i = 0; i < data.Length; ++i)
        {
            ref readonly T d = ref data[i];

            if (d.Equals(Check))
                ++count;
        }

        return count;
    }

    [Benchmark]
    public int ForRef()
    {
        int count = 0;
        for (int i = 0; i < this.Data.Length; ++i)
        {
            ref readonly T d = ref this.Data[i];

            if (d.Equals(Check))
                ++count;
        }

        return count;
    }
}

The Benchmark class is generic and takes any IEquatable struct. I used the Color struct from the 2nd question and added some larger structs as well to see if size of the struct changed the results. The Color struct and DoubleColor are included below with the LargeStruct. I'll skip the TripleColor and QuadColor variations since this is already getting so long.

public struct Color : IEquatable<Color>
{
    public float R;
    public float G;
    public float B;
    public float A;

    public bool Equals([AllowNull] Color other) => this.R == other.R;
}

public struct DoubleColor : IEquatable<DoubleColor>
{
    public Color First;
    public Color Second;

    public bool Equals([AllowNull] DoubleColor other)
    {
        return this.First.R == other.First.R;
    }
}
public struct LargeStruct : IEquatable<LargeStruct>
{
    public Color First;
    public Guid G0;
    public Guid G1;
    public Guid G2;
    public Guid G3;
    public Guid G4;
    public Guid G5;
    public Guid G6;
    public Guid G7;
    public Guid G8;
    public Guid G9;


    public bool Equals([AllowNull] LargeStruct other)
    {
        return this.First.R == other.First.R;
    }
}

I then executed this using the BenchmarkDotNet runner like this:

    static void Main(string[] args)
    {
        var config = ManualConfig.Create(DefaultConfig.Instance)
                                    .WithOption(ConfigOptions.KeepBenchmarkFiles, true);

        BenchmarkRunner.Run<ArrayEnumerationBenchMark<byte>>(config);
        BenchmarkRunner.Run<ArrayEnumerationBenchMark<short>>(config);
        BenchmarkRunner.Run<ArrayEnumerationBenchMark<double>>(config);
        BenchmarkRunner.Run<ArrayEnumerationBenchMark<decimal>>(config);
        BenchmarkRunner.Run<ArrayEnumerationBenchMark<Color>>(config);
        BenchmarkRunner.Run<ArrayEnumerationBenchMark<DoubleColor>>(config);
        BenchmarkRunner.Run<ArrayEnumerationBenchMark<TripleColor>>(config);
        BenchmarkRunner.Run<ArrayEnumerationBenchMark<QuadColor>>(config);
        BenchmarkRunner.Run<ArrayEnumerationBenchMark<LargeStruct>>(config);
    }

The Results: I'll just show some of the results the demonstrate my concerns about the results. I'll add the type above each block of results. Note BenchmarkDotNet dropped the Median from the summary at times and I don't know why that happened. So ignore that column.

Byte

|         Method | DataSize |     Mean |     Error |    StdDev |
|--------------- |--------- |---------:|----------:|----------:|
|         RefInc |    10000 | 4.587 μs | 0.0573 μs | 0.0479 μs |
|        ForEach |    10000 | 9.765 μs | 0.1775 μs | 0.2308 μs |
|    ForEachSpan |    10000 | 4.879 μs | 0.0971 μs | 0.2027 μs |
| ForEachSpanRef |    10000 | 4.767 μs | 0.0950 μs | 0.0889 μs |
|            For |    10000 | 9.871 μs | 0.1937 μs | 0.2306 μs |
|        ForSpan |    10000 | 4.756 μs | 0.0680 μs | 0.0568 μs |
|     ForRefSpan |    10000 | 4.798 μs | 0.0719 μs | 0.0600 μs |
|         ForRef |    10000 | 7.303 μs | 0.0795 μs | 0.0620 μs |

double

|         Method | DataSize |      Mean |     Error |    StdDev |
|--------------- |--------- |----------:|----------:|----------:|
|         RefInc |    10000 | 11.649 μs | 0.2311 μs | 0.5168 μs |
|        ForEach |    10000 |  7.441 μs | 0.1470 μs | 0.2867 μs |
|    ForEachSpan |    10000 |  6.865 μs | 0.1354 μs | 0.2705 μs |
| ForEachSpanRef |    10000 |  7.872 μs | 0.1564 μs | 0.2570 μs |
|            For |    10000 |  7.347 μs | 0.1178 μs | 0.0920 μs |
|        ForSpan |    10000 |  6.895 μs | 0.1376 μs | 0.2143 μs |
|     ForRefSpan |    10000 |  6.890 μs | 0.1371 μs | 0.3540 μs |
|         ForRef |    10000 |  8.861 μs | 0.1755 μs | 0.3208 μs |

Color

|         Method | DataSize |      Mean |     Error |    StdDev |    Median |
|--------------- |--------- |----------:|----------:|----------:|----------:|
|         RefInc |    10000 |  8.570 μs | 0.0864 μs | 0.0674 μs |  8.588 μs |
|        ForEach |    10000 |  9.824 μs | 0.1097 μs | 0.0857 μs |  9.832 μs |
|    ForEachSpan |    10000 |  9.939 μs | 0.1979 μs | 0.1851 μs |  9.899 μs |
| ForEachSpanRef |    10000 | 14.416 μs | 0.2858 μs | 0.6214 μs | 14.096 μs |
|            For |    10000 |  9.846 μs | 0.1415 μs | 0.1323 μs |  9.797 μs |
|        ForSpan |    10000 | 11.860 μs | 0.4646 μs | 1.3552 μs | 12.143 μs |
|     ForRefSpan |    10000 | 13.926 μs | 0.0906 μs | 0.0708 μs | 13.920 μs |
|         ForRef |    10000 | 17.225 μs | 0.3361 μs | 0.6061 μs | 16.950 μs |

LargeStruct

|         Method | DataSize |     Mean |   Error |   StdDev |   Median |
|--------------- |--------- |---------:|--------:|---------:|---------:|
|         RefInc |    10000 | 127.2 μs | 1.86 μs |  1.55 μs | 127.3 μs |
|        ForEach |    10000 | 252.4 μs | 4.76 μs |  4.89 μs | 251.9 μs |
|    ForEachSpan |    10000 | 253.5 μs | 4.97 μs |  8.31 μs | 250.4 μs |
| ForEachSpanRef |    10000 | 129.0 μs | 2.54 μs |  4.89 μs | 127.1 μs |
|            For |    10000 | 248.5 μs | 4.68 μs |  4.15 μs | 247.6 μs |
|        ForSpan |    10000 | 251.3 μs | 4.48 μs |  3.97 μs | 250.5 μs |
|     ForRefSpan |    10000 | 252.0 μs | 3.48 μs |  2.90 μs | 251.2 μs |
|         ForRef |    10000 | 258.9 μs | 5.10 μs | 12.70 μs | 253.5 μs |

I seems like the C# compiler has a special case for enumerating an array (https://medium.com/@zsalloum/performance-of-the-loops-in-c-b2961c6d60d8) — Jeremy Lakeman, Sep 29 '20 at 03:51
Of course the runtime JIT compiler may make further optimisations. — Jeremy Lakeman, Sep 29 '20 at 03:57
Your benchmarks are a pretty good indication of the facts here, and a great way of showing how you can use theory and empirical results to make decisions. I am not sure we can add much more, though I will note some of the results are surprising... In particular ref span foreach on color, though we are missing a lot of details in general — TheGeneral, Sep 29 '20 at 03:58
@JeremyLakeman interesting article, seems to indicate this optimization pre-dates .net core. And likely has nothing to do with value types. — MikeJ, Sep 29 '20 at 12:44
@MichaelRandall I was a bit surprised by how the results varied over the various tests. With that large result for ref span foreach on Color as a prime example. — MikeJ, Sep 29 '20 at 12:49
When you get down to the point where you're interested in performance of iterating through things it becomes worthwhile to take a step back and see if you can gain more by restructuring the data (i.e. exploding structs into arrays) and/or processing it completely differently (SIMD approaches like `Vector`), as these can give order of magnitude improvements. — Jeroen Mostert, Sep 29 '20 at 15:57
do you consider multi-threads programing with `foreach`? I believe `foreach` is designed for extra purposes, it's sure to provide better performance if you use it properly. — Dongdong, Sep 29 '20 at 16:03
@Dongdong if by foreach you mean the idea of enumerators then I believe it's a general purpose paradigm for iterating over any sequences. But I don't believe that has anything to do with multi-threading. Maybe you're thinking of Parallel.ForEach — MikeJ, Sep 29 '20 at 16:07
@JeroenMostert Good point. I came to this question a bit sideways looking into making the sequence of structs I was iterating over read only. Following some MS guidance: https://learn.microsoft.com/en-us/dotnet/csharp/write-safe-efficient-code While looking at that I stumbled on the two questions I referenced. To your point, I'll look more broadly for performance improvements and not focus purely on the mechanics of how I'm iterating the data. — MikeJ, Sep 29 '20 at 16:12
@JeroenMostert I guess the other thing that was interesting to me was that the old - 11 year old answer - was still being served up when looking at foreach/for performance. But I guess that's a broader SO issue. — MikeJ, Sep 29 '20 at 16:15

In .NET Core is for and foreach now the same when iterating over an array of valuetypes?

0 Answers0

Linked