0

I was reading this comment by casablanca in https://softwareengineering.stackexchange.com/a/411324/109967:

There's another caveat here: value types are boxed onto the heap if accessed via an interface, so you'd still incur a heap allocation if enumerating via IList or IEnumerable. You'd have to be holding onto a concrete List instance to avoid the allocation.

I wanted to test this theory using this .NET 5 console app (I've tried swapping between the List<T> and IList<T>):

using System;
using System.Collections.Generic;

namespace ConsoleApp10
{    
    class Program
    {
        static void Main(string[] args)
        {
            IList<int> numbers = new List<int> { 0, 1, 2, 3, 4, 5 };
            //List<int> numbers = new List<int> { 0, 1, 2, 3, 4, 5 };

            for (int i = 0; i < 2000; i++)
            {
                foreach (var number in numbers)
                {

                }
            }
            GC.Collect();
            Console.WriteLine("GC gen 0 collection count = " + GC.CollectionCount(0)); //Always 1
            Console.WriteLine("GC gen 1 collection count = " + GC.CollectionCount(1)); //Always 1
            Console.WriteLine("GC gen 2 collection count = " + GC.CollectionCount(2)); //Always 1
                
        }
    }
}

It appears as though all generations get full and are GCed.

If List<T> uses a struct Enumerator, then there should not be any heap allocations happening up until GC.Collect() is called (afterwards string objects are created in the calls to Console.WriteLine() but that happens after the GC runs).

Question 1: Why are heap allocations happening when using List<T>?

Question 2: I understand that there would be heap allocations due to the reference type Enumerator used when iterating a List<T> via the IList<T> interface (assuming the comment in the linked question is correct), but why does a collection happen in all 3 generations?

2000 Enumerator objects is a lot of objects, but once the foreach loop completes, it's ready to be GCed because nothing is referring to the object after the foreach completes. Why are the objects making it through to Gen 1 and Gen 2?

David Klempfner
  • 8,700
  • 20
  • 73
  • 153
  • I don’t think the comment from casablanca is correct. It’s true that when you access value types through an interface they get put on a heap. But this refers to stuff like `IFormattable formattable = 1`. It does not refer to if the value type is inside another object and that object gets referenced via an interface. Therefore foreach-ing over a List and IList should behave the same. – ckuri Jun 05 '21 at 12:14
  • 2
    @ckuri Incorrect. `List` has an enumerator [which is a struct](https://source.dot.net/#System.Private.CoreLib/List.cs,597), but if you access it as an `IEnumerable`, [it gets boxed](https://source.dot.net/#System.Private.CoreLib/List.cs,600) – canton7 Jun 05 '21 at 12:42
  • 1
    @David You're misinterpreting what `GC.CollectionCount` is showing you. It shows you how many times that generation has been collected. `GC.Collect()` forces a collection of all generations. So all you're seeing is the fact that a collection is happened, because you called `GC.Collection()`. You're seeing nothing about whether any objects were allocated. Use BenchmarkDotNet with `MemoryDiagnoser` – canton7 Jun 05 '21 at 12:44
  • 1
    Is the `Console.WriteLine(number);` line an essential part of the problem, or it's just a needless complication and could be removed? – Theodor Zoulias Jun 05 '21 at 14:16
  • True: that will also be causing a boxing allocation and string allocation each time it's called. – canton7 Jun 05 '21 at 14:45
  • 1
    @TheodorZoulias I thought without it the loops would be optimised away. But now I realise that only happens in release build. I'll remove it. – David Klempfner Jun 06 '21 at 02:00

1 Answers1

4

You're confused by what GC.CollectionCount() is showing you, I think.

Every time you call GC.Collect(), you force a full collection of all generations. You then call GC.CollectionCount(), which is showing you that all generations have just been collected. You're just observing the effects of calling GC.Collect()!

It's true that the IList version is allocating enumerators, but those are dying quickly in gen0.

The proper tool to exmaine this sort of stuff is BenchmarkDotNet with the MemoryDiagnoser.

I put together a simple benchmark:

public static class Program
{
    public static void Main()
    {
        BenchmarkRunner.Run<Benchmarks>();
    }
}

[MemoryDiagnoser]
public class Benchmarks
{
    [Benchmark]
    public int IList()
    {
        int sum = 0;
        IList<int> numbers = new List<int> { 0, 1, 2, 3, 4, 5 };

        for (int i = 0; i < 2000; i++)
        {
            foreach (var number in numbers)
            {
                sum += number;
            }
        }
        return sum;
    }

    [Benchmark]
    public int List()
    {
        int sum = 0;
        List<int> numbers = new List<int> { 0, 1, 2, 3, 4, 5 };

        for (int i = 0; i < 2000; i++)
        {
            foreach (var number in numbers)
            {
                sum += number;
            }
        }
        return sum;
    }
}

This produced the results:

BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19042.985 (20H2/October2020Update)
Intel Core i5-6300U CPU 2.40GHz (Skylake), 1 CPU, 4 logical and 2 physical cores
.NET SDK=5.0.300-preview.21180.15
  [Host]     : .NET 5.0.5 (5.0.521.16609), X64 RyuJIT
  DefaultJob : .NET 5.0.5 (5.0.521.16609), X64 RyuJIT
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
IList 149.42 μs 2.902 μs 3.563 μs 51.0254 - - 80,128 B
List 48.98 μs 0.961 μs 1.524 μs - - - 128 B

The 128B allocation that's common to both will be the List<T> instance itself. Other than this see how the IList version is allocating about 80KB more, causing 51 new gen0 collections per 1000 operations?

canton7
  • 37,633
  • 3
  • 64
  • 77
  • I wasn't confused about `GC.CollectionCount()`. I was confused about what `GC.Collect()` did. I thought it would only perform a collection if there were objects to be collected. Thanks for your answer. It explains it well. – David Klempfner Jun 06 '21 at 02:03
  • When you say "per 1000 operations", do you mean 2000? – David Klempfner Jun 06 '21 at 05:04
  • No. That column is the number of collections per 1000 operations, according to the BenchmarkDotNet docs. It's unrelated to your loop. – canton7 Jun 06 '21 at 08:15
  • Why does it show per 1000 operations? Why not just show how many collections there were while the app was running? – David Klempfner Jun 06 '21 at 09:50
  • 1
    BenchmarkDotNet invokes those `List` and `IList` methods many many times, but it decides how many times to do it based on a number of different factors. The number is the number of GC collections per 1,000 invocations, presumably because "51" is easier to read than "0.0051". [See here](https://stackoverflow.com/a/52255580/1086121) – canton7 Jun 06 '21 at 10:02
  • 1
    From this can you infer that the GC's probably set a gen0 budget limit of around 1.57MB (that's how much needs to be allocated in gen0 before a GC is triggered), but that's entirely an implementation detail. – canton7 Jun 06 '21 at 10:10
  • Some feedback I got from someone about your code snippet: You shouldn't create an "iteration" loop in the code yourself. It is a parameter of the benchmark job itself. – David Klempfner Jun 15 '21 at 09:39
  • 1
    @DavidKlempfner I just copy/pasted OP's code, to show how simple the translation can be. Agreed, it should be set up so that the creation of the lists isn't part of the benchmark either. For simple cases like this though, it's likely that you will need an inner loop, or BenchmarkDotNet reports the benchmark as being too cheap to actually measure reliably. – canton7 Jun 15 '21 at 09:43