Why do BCL Collections use struct enumerators, not classes?

Question

We all know mutable structs are evil in general. I'm also pretty sure that because IEnumerable<T>.GetEnumerator() returns type IEnumerator<T>, the structs are immediately boxed into a reference type, costing more than if they were simply reference types to begin with.

So why, in the BCL generic collections, are all the enumerators mutable structs? Surely there had to have been a good reason. The only thing that occurs to me is that structs can be copied easily, thus preserving the enumerator state at an arbitrary point. But adding a Copy() method to the IEnumerator interface would have been less troublesome, so I don't see this as being a logical justification on its own.

Even if I don't agree with a design decision, I would like to be able to understand the reasoning behind it.

related pages for others running across this: http://stackoverflow.com/questions/384511/enumerator-implementation-use-struct-or-class http://www.eggheadcafe.com/software/aspnet/31702392/c-compiler-challenge--s.aspx — James Manning, Jul 02 '10 at 18:59

score 97 · Accepted Answer · edited May 09 '21 at 06:09

Indeed, it is for performance reasons. The BCL team did a lot of research on this point before deciding to go with what you rightly call out as a suspicious and dangerous practice: the use of a mutable value type.

You ask why this doesn't cause boxing. It's because the C# compiler does not generate code to box stuff to IEnumerable or IEnumerator in a foreach loop if it can avoid it!

When we see

foreach(X x in c)

the first thing we do is check to see if c has a method called GetEnumerator. If it does, then we check to see whether the type it returns has method MoveNext and property current. If it does, then the foreach loop is generated entirely using direct calls to those methods and properties. Only if "the pattern" cannot be matched do we fall back to looking for the interfaces.

This has two desirable effects.

First, if the collection is, say, a collection of ints, but was written before generic types were invented, then it does not take the boxing penalty of boxing the value of Current to object and then unboxing it to int. If Current is a property that returns an int, we just use it.

Second, if the enumerator is a value type then it does not box the enumerator to IEnumerator.

Like I said, the BCL team did a lot of research on this and discovered that the vast majority of the time, the penalty of allocating and deallocating the enumerator was large enough that it was worth making it a value type, even though doing so can cause some crazy bugs.

For example, consider this:

struct MyHandle : IDisposable { ... }
...
using (MyHandle h = whatever)
{
    h = somethingElse;
}

You would quite rightly expect the attempt to mutate h to fail, and indeed it does. The compiler detects that you are trying to change the value of something that has a pending disposal, and that doing so might cause the object that needs to be disposed to actually not be disposed.

Now suppose you had:

struct MyHandle : IDisposable { ... }
...
using (MyHandle h = whatever)
{
    h.Mutate();
}

What happens here? You might reasonably expect that the compiler would do what it does if h were a readonly field: make a copy, and mutate the copy in order to ensure that the method does not throw away stuff in the value that needs to be disposed.

However, that conflicts with our intuition about what ought to happen here:

using (Enumerator enumtor = whatever)
{
    ...
    enumtor.MoveNext();
    ...
}

We expect that doing a MoveNext inside a using block will move the enumerator to the next one regardless of whether it is a struct or a ref type.

Unfortunately, the C# compiler today has a bug. If you are in this situation we choose which strategy to follow inconsistently. The behaviour today is:

if the value-typed variable being mutated via a method is a normal local then it is mutated normally
but if it is a hoisted local (because it's a closed-over variable of an anonymous function or in an iterator block) then the local is actually generated as a read-only field, and the gear that ensures that mutations happen on a copy takes over.

Unfortunately the spec provides little guidance on this matter. Clearly something is broken because we're doing it inconsistently, but what the right thing to do is not at all clear.

+1 This means that there are (minimal) performance penalties of passing around an `IEnumerable` as opposed to the original generic collection -- in a quick release-mode test enumerating over a `List` of ten million entries both directly and when cast to an `IEnumerable`, I see a consistent timing difference of 2:1 (in this case, ~100ms vs ~50ms). — Ben M, Jul 02 '10 at 19:09
Good answer, I didn't know about that optimization - but it makes perfect sense. I do find it somewhat ironic that I linked your blog to backup my statement that mutable structs are evil - and you answer my question :) — Eloff, Jul 02 '10 at 23:14
Btw, that is an ugly corner case, yet another problem with mutable structs. — Eloff, Jul 02 '10 at 23:32
I wondered why this answer was so well informed; then I saw who wrote it. — Pharap, Apr 08 '15 at 04:50
Also wondering if any changes were introduced regarding this? Thanks for the rather insightful answer Eric. — Vedran Mandić, Nov 19 '18 at 20:43
"then the foreach loop is generated entirely using direct calls to those methods and properties" - If your collection is an array then the generated code in CIL is actually a for loop. — David Klempfner, Sep 25 '19 at 14:41
@Backwards_Dave: That's correct. The compiler is not *required* to call `GetEnumerator`; it is required to generate code that enumerates the collection. If it can do so without calling `GetEnumerator` because arrays are a very special type whose behaviour is 100% known, then it can choose to do so. — Eric Lippert, Sep 25 '19 at 15:46

score 6 · Answer 2 · answered Jul 02 '10 at 18:46

6

Struct methods are inlined when type of struct is known at compile time, and calling method via interface is slow, so answer is: because of performance reason.

answered Jul 02 '10 at 18:46

STO

10,390
8
32
32

But these are internal structs, so the type is never known at compile time; and all end-user code accesses them via interfaces. – Stephen Cleary Jul 02 '10 at 18:53
If you look at for instance List.GetEnunmerator method ( http://msdn.microsoft.com/en-us/library/b0yss765.aspx ) you can see it returns List::Enumerator struct. foreach loop in C# does not use IEnumerable interface directly, it is enought for it if class has GetEnumerator method. So type of enumerator is known at compile time. – STO Jul 02 '10 at 19:05
1

+1, this is accurate. The JIT compiler can generate much more efficient code. – Hans Passant Jul 02 '10 at 19:37

Why do BCL Collections use struct enumerators, not classes?

2 Answers2

Linked

Related