3

We recently had a bug due to the behaviour of the Enumerator of a SortedDictionary.ValueCollection behaving differently to other enumerators. I've managed to narrow the problem down to the following (nonsensical) example:

public void Example()
{
    var sorted = new SortedDictionary<string, int>
    {
        {"1", 1 },
        {"3", 3 },
        {"0", 0 },
        {"2", 2 },
        {"4", 4 }
    };

    var fromValues = sorted.Values.GetEnumerator();
    fromValues.MoveNext();

    var fromLinq = sorted.Select(x => x.Value).GetEnumerator();
    fromLinq.MoveNext();

    var fromDictionary = new Dictionary<string, int>(sorted).Values.GetEnumerator();
    fromDictionary.MoveNext();

    for (var i = 0; i < 3; i++)
    {
        Console.WriteLine($"Printing for {i}");
        Print("  From Values:     ", fromValues, i);
        Console.WriteLine("  ------------");
        Print("  From Linq:       ", fromLinq, i);
        Console.WriteLine("  ------------");
        Print("  From Dictionary: ", fromDictionary, i);
        Console.WriteLine();
    }
}

private void Print(string prefix, IEnumerator<int> enumerator, int value)
{
    do
    {
        Console.WriteLine(prefix + "Value in loop:  " + enumerator.Current);
        if (enumerator.Current == value)
        {
            Console.WriteLine(prefix + "Selected Value: " + enumerator.Current);
            break;
        }
    } while (enumerator.MoveNext());
}

This will produce the following output:

Printing for 0
  From Values:     Value in loop:  0
  From Values:     Selected Value: 0
  ------------
  From Linq:       Value in loop:  0
  From Linq:       Selected Value: 0
  ------------
  From Dictionary: Value in loop:  0
  From Dictionary: Selected Value: 0

Printing for 1
  From Values:     Value in loop:  0
  From Values:     Value in loop:  1
  From Values:     Selected Value: 1
  ------------
  From Linq:       Value in loop:  0
  From Linq:       Value in loop:  1
  From Linq:       Selected Value: 1
  ------------
  From Dictionary: Value in loop:  0
  From Dictionary: Value in loop:  1
  From Dictionary: Selected Value: 1

Printing for 2
  From Values:     Value in loop:  0
  From Values:     Value in loop:  2
  From Values:     Selected Value: 2
  ------------
  From Linq:       Value in loop:  1
  From Linq:       Value in loop:  2
  From Linq:       Selected Value: 2
  ------------
  From Dictionary: Value in loop:  0
  From Dictionary: Value in loop:  1
  From Dictionary: Value in loop:  2
  From Dictionary: Selected Value: 2

As you can see, the three Iterators behave differently:

  • SortedDictionary: Current is always 0 when passed to a function, but MoveNext pushes to the correct value in the loop.
  • Linq: Behaves as I expected an Iterator to behave.
  • Dictionary: Resets every time the Enumerator is passed into a function.

I suspect the fact that SortedDictionary.ValueCollection.Enumerator is a struct and the one produced by linq is a reference type has something to do with it. But that doesn't explain why it doesn't behave as the Enumerator from Dictionary.

Lodewijk
  • 2,380
  • 4
  • 25
  • 40
  • This is indeed weird - it also happens like this for .Net Core – Matthew Watson Jan 31 '20 at 14:37
  • For reference, [here's an alternative simplified repro on .Net Fiddle](https://dotnetfiddle.net/GjCESD) – Matthew Watson Jan 31 '20 at 14:38
  • @MatthewWatson the behavior of simplified repro is clear. `Enumerator` is a `struct` and at every call of `PrintNextValue` method you pass its copy, where `Current` is set to `0`, first element in collection. But it doesn't answer the OP question – Pavel Anikhouski Jan 31 '20 at 15:00
  • In the mean time I found my way to [Jon Skeet](https://codeblog.jonskeet.uk/2010/07/27/iterate-damn-you/) and [Eric Lippert](https://stackoverflow.com/questions/3168311/why-do-bcl-collections-use-struct-enumerators-not-classes/3168435#3168435). But that doesn't explain the SortedDictionary... – Lodewijk Jan 31 '20 at 15:02

1 Answers1

1

I believe that the answer to this is that the SortedDictionary value enumerator is a struct (of type System.Collections.Generic.SortedDictionary<string,int>.ValueCollection.Enumerator).

What's happening is that the struct is being copied each time it is passed to the Print() method, thus the method is always working with a copy of the original enumerator - and this causes things to work strangely.

The following program demonstrates this:

using System;
using System.Collections.Generic;

class Program
{
    static void Main()
    {
        Example();
    }

    public static void Example()
    {
        var sorted = new SortedDictionary<string, int>
        {
            {"1", 1 },
            {"3", 3 },
            {"0", 0 },
            {"2", 2 },
            {"4", 4 }
        };

        var fromValues1 = sorted.Values.GetEnumerator();
        // fromValue1 type is struct System.Collections.Generic.SortedDictionary<string,int>.ValueCollection.Enumerator
        fromValues1.MoveNext();

        while (printNextValue(fromValues1)) // Prints 0 once for each value in the dictionary.
            ;

        Console.WriteLine("-----------------");

        IEnumerator<int> fromValues2 = sorted.Values.GetEnumerator();
        // fromValues2 type is boxed struct System.Collections.Generic.SortedDictionary<string,int>.ValueCollection.Enumerator
        fromValues2.MoveNext();

        while (printNextValue(fromValues2)) // Prints each value in the dictionary.
            ;
    }

    static bool printNextValue(IEnumerator<int> enumerator)
    {
        Console.WriteLine(enumerator.Current);
        return enumerator.MoveNext();
    }
}

The first loop outputs all zeros, while the second outputs the correct values.

The only difference between the two loops in this example is that the first iterator is declared as:

var fromValues1 = sorted.Values.GetEnumerator();

And the second one is declared as:

IEnumerator<int> fromValues2 = sorted.Values.GetEnumerator();.

The first declaration will result in fromValues1 being a struct, while the second is a boxed struct.

Because the struct is boxed, it means that it is NOT copied when it is passed to printNextValue(), which means that printNextValue() will be working with the original enumerator rather than a copy of it.

However, this does NOT explain why the loop terminates! If the original enumerator position was being copied each time printNextValue() was called, then the loop would never terminate since the position of the original enumerator would never be updated.

This leads me to believe that some complexity in the implementation of SortedDictionary.Enumerator means that when the struct is copied, some of its data is copied, but some of it isn't.

(Looking at the source code, I suspect this is due to the enumerator implementation's Current being a value type that gets copied, but the MoveNext() seems to manipulate a stack which - being a reference type - is shared between all the copies of the enumerator. However, the code is a bit too complex for me to analyse in my currently limited time...)

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • 1
    I guess, that it happens because `SortedDictionary.ValueCollection.Enumerator` uses `SortedDictionary.Enumerator dictEnum;`, which uses `TreeSet>.Enumerator treeEnum` (it's too much:)). `TreeSet` [enumerator](https://referencesource.microsoft.com/#System/compmod/system/collections/generic/sortedset.cs,d1840be2805451bd) maintains `Stack` of tree nodes. [`MoveNext`](https://referencesource.microsoft.com/#System/compmod/system/collections/generic/sortedset.cs,9d3de2e4d23886e9,references) has a complicated logic – Pavel Anikhouski Jan 31 '20 at 21:14
  • 1
    When enumerator declared on this way `var fromValues1 = sorted.Values.GetEnumerator();`, at the beginning of `printNextValue` method the `Current` always points to first item in enumeration, which is `0`. When `return `enumerator.MoveNext();` is called, the `Current` is set to "real" and correct position. Therefore loop breaks when expected. I guess, the internal implementation of `Enumerator` and using a stack of items are reasons of that – Pavel Anikhouski Jan 31 '20 at 21:30
  • So basically: the Enumerator is a value type, but it's just not very good at it :) – Lodewijk Feb 03 '20 at 08:06