Why is in C# a nested foreach loop more performant than a SelectMany combined with a single foreach loop or a GroupBy?

Question

Consider having these Models:

public class BenchMarks{
    public List<Parent> Parents { get; set; }
}

public class Parent {
    public List<Child> Children { get; set; }
}

public class Child {
    public string Name { get; set; }
    public int Value { get; set; }
}

and the class BenchMarks also has these methods:

public void GetWithTwoForeach(){
    var dict = new Dictionary<string,int>();
    foreach(var parent in Parents){
        foreach(var child in parent.Children){
            if (dict.ContainsKey(child.Name))
                dict[child.Name] += child.Value;
            else
                dict.Add(child.Name, child.Value);
        }
    }
}

public void GetWithOneForeach(){
    var dict = new Dictionary<string,int>();
    foreach(var child in Parents.SelectMany(p=>p.Children)){
        if (dict.ContainsKey(child.Name))
            dict[child.Name] += child.Value;
        else
            dict.Add(child.Name, child.Value);
    }
}

public void GetWithGroupSum(){
    var childGroups = Parents.SelectMany(p=>p.Children).GroupBy(c => c.Name);   
    var dict = childGroups.ToDictionary(cg => cg.Key, cg => cg.Sum(child => child.Value));
}

Than why is it that the nested foreach loop outperforms the other two methods by more than 100%?

Benchmark fiddle

Edit My benchmark was indeed affected by the ordered nature of my test data.

When I changed it to this (I was not able to get this to work in the .NET fiddle)

public void Setup()
{
    Parents = new List<Parent>();
    
    var names = new List<string>{"Tigo","Edo","Kumal","Manu","Guido","Anne","Steff","Mimi","Suzan","Jeff"};
                    
    var random = new Random();
    
    for (var i = 0; i < 200; i++)
    {
        int index = random.Next(names.Count);
        Parents.Add(new Parent
        {
            Children = new List<Child>
            {
                new Child
                {
                    Name = names[index],
                    Value = i
                }
            }
        });
    }
}

The benchmark came out a bit less dramatic, but still the same conclusion that LINQ adds overhead as mentioned in comments

It gives a direction, thanks. I tried using sharplab.io to figure out what is going on, but I do not yet understand what it is that LINQ does, which makes it slower — Daniël Tulp, Feb 01 '23 at 09:12
See the LINQ source code [here](https://source.dot.net/#System.Linq/System/Linq/SelectMany.cs,115), and see what it lowers to [here](https://sharplab.io/#v2:CYLg1APgAgTAjAWAFBQAwAIpwCwG5lQDMmM6AwugN7Lq120AOATgJYBuAhgC4CmmcANn6EAPABUASjwDOAVwA2XAHzoAyj3k8AxlwCyHAHYBPAJK8m3APZNxqy7KZaeAGnRiyl+Zp0tLB15IyCsoAFFiiYnYOTirS9o4umACstvFOruHiHl7aXL4GSipant55fuql1hkpkWmJ7iW5+QFScooqTEGKFbnWAJQ09HTUSENj6ABm1jwcWgAW6CG10XwaPAC2PAZc6CwG6HErA6Pj9COnp1OdswtL2aX5B7IARgCimpvbu/vFOT7lax01hCa0+XD6xwu43OUIuWDgmAA7OhOm0uD0gUwQR8tlxXHI3jjtn18CdYXQAL6DKFUslDWkUoA) — Sweeper, Feb 01 '23 at 09:14
@DaniëlTulp LINQ is a kind of functional programming behind the scene, following functional programming principles, when you iterate a list, it will evaluate elements, which can cost more time. So using LINQ is not always faster. But in many cases it is smarter, reducing the line of code. If you have a performance important algorithm I will always stick with the classical way of programming in c#. — Maytham Fahmi, Feb 01 '23 at 09:40
The "Allocated" column tells the tale. GC is quite fast, but has non-zero cost. In fact, you got reason to worry that the benchmark is not representative for actual performance. The GetWithTwo method uses so very little memory that you know there's something fishy about child.Name. They are all the same for the children of a parent, so the dictionary reuses the same element repeatedly. Tends to happen for auto-generated data, not for real data. Shows that you can easily beat Linq when you know enough about the data. Or that real data makes the difference disappear. — Hans Passant, Feb 01 '23 at 10:10
As an aside, if you want to more than double the speed of `GetWithTwoForeach()`, you can update the values like so in the inner loop: `ref int value = ref CollectionsMarshal.GetValueRefOrAddDefault(dict, child.Name, out _); value += child.Value;` — Matthew Watson, Feb 01 '23 at 10:12

Why is in C# a nested foreach loop more performant than a SelectMany combined with a single foreach loop or a GroupBy?

0 Answers0