2

I know that in .net collection types (or at least some collection types) do not allow modify the collection when you are iterating on it.

For example in the List class exist a code like this:

if (this.version != this.list._version)
 ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);

But obviously this is a decision of the developers that design the iterator class, because I can provide some implementations of IEnumerable that at least do not throw any Exception when the under-laying collection is modified.

Then I have some questions:

  • Why I should not modify a collection when I am iterating on it ?

  • It is possible to create a collection that support to be modified when iterating on it, without to have any other problems ? (NOTE: that the first answer can answer this one too)

  • When the C# compiler generate the Enumerator interface implementation takes into account this such of things?

  • 1
    ConcurrentDictionary does allow modifications while enumerating. – ken2k Jan 29 '13 at 17:08
  • 1
    @ken2k `ConcurrentDictionary`, as well as the other collections in that namespace, don't return iterators that iterate the collection. They copy all of their elements into a temporary namespace and return an iterator of that collection, this providing an iterator of a collection that is never modified. That is one way to provide an iterator for a sequence when you expect the underlying data to change during iteration. – Servy Jan 29 '13 at 17:49
  • @Servy: The documentation for `ConcurrentDictionary` specifically states that there is no guarantee that the combination of items it returns represents a snapshot of a state that actually ever existed. For example, if one thread adds "Moe" and "Curly", in that order, while another thread is trying to enumerate the dictionary, there's no guarantee that the second thread can't see "Curly" without seeing "Moe". I believe that items not added or removed during enumeration are guaranteed to be returned exactly once, and items added or removed during enumeration will be returned at most once. – supercat Jan 30 '13 at 16:57

5 Answers5

5

Why I should not modify a collection when I am iterating on it?

Some collections can be modified when iterating, so it's not globally bad. In most cases it's very difficult to write an effective iterator that will work correctly even when the underlying collection is modified. In many cases the exception is the iterator writer punting and saying that they just don't want to deal with it.

In certain cases it's not clear what the iterator should do when the underlying collection is changed. Some cases are clear-cut, but for others different people will expect different behavior. Whenever you're in that situation it's a sign that there is a deeper problem (that you shouldn't mutating a sequence that you're iterating)

It is possible to create a collection that support to be modified when iterating on it, without to have any other problems ? (NOTE: that the first answer can answer this one too)

Sure.

Consider this iterator for a list:

public static IEnumerable<T> IterateWhileMutating<T>(this IList<T> list)
{
    for (int i = 0; i < list.Count; i++)
    {
        yield return list[i];
    }
}

If you remove an item at or before the current index from the underlying list then an item will be skipped while iterating. If you add an item at or before the current index the an item will be duplicated. But if you add/remove items past the current index during iteration then there won't be a problem. We could try to be fancy and make an attempt to see if an item was removed/added from the list and adjust the index accordingly, but it couldn't always work, so we wouldn't be able to handle all cases. If we had something like an ObservableCollection then we could be notified of additions/removals and their indexes and adjust the index accordingly, thus allowing the iterator to handle mutatings of the underlying collection (as long as it's not in another thread).

Since an iterator for an ObservableCollection can know both when any items are added/removed, and where they are, it can adjust it's position accordingly. I am unsure if the built in iterator properly handles mutation, but here is one that will handle any mutation of the underlying collection:

public static IEnumerable<T> IterateWhileMutating<T>(
    this ObservableCollection<T> list)
{
    int i = 0;
    NotifyCollectionChangedEventHandler handler = (_, args) =>
    {
        switch (args.Action)
        {
            case NotifyCollectionChangedAction.Add:
                if (args.NewStartingIndex <= i)
                    i++;
                break;
            case NotifyCollectionChangedAction.Move:
                if (args.NewStartingIndex <= i)
                    i++;
                if (args.OldStartingIndex <= i) //note *not* else if
                    i--;
                break;
            case NotifyCollectionChangedAction.Remove:
                if (args.OldStartingIndex <= i)
                    i--;
                break;
            case NotifyCollectionChangedAction.Reset:
                i = int.MaxValue;//end the sequence
                break;
            default:
                //do nothing
                break;
        }
    };
    try
    {
        list.CollectionChanged += handler;
        for (i = 0; i < list.Count; i++)
        {
            yield return list[i];
        }
    }
    finally
    {
        list.CollectionChanged -= handler;
    }
}
  • If an item is removed from "earlier" in the sequence, we continue normally without skipping an item.

  • If an item is added "earlier" in the sequence we won't show it, but we also won't show some other item twice.

  • If an item is moved from before the current position to after it will be shown twice, but no other item will be skipped or repeated. If an item is moved from after the current position to before the current position it won't be shown, but that's all. If an item is moved from either later in the collection to another spot later, there is no problem, and the move will be seen in the result, if it is moved from an earlier spot to another earlier spot, everything is fine and the move won't be "seen" by the iterator.

  • Replacing an item isn't a problem; it will only be seen if it's "after" the current position though.

  • Resetting the collection results in the sequence ending gracefully at the current position.

Note that this iterator won't handle situations with multiple threads. If another thread mutates the collection while another is iterating, bad things could happen (items being skipped or repeated, or even exceptions, such as index out of bounds exceptions). What this does allow is mutations during iteration in which there is either only one thread, or in which only one thread is ever executing code that moves the iterator or mutates the collection.

When the C# compiler generate the Enumerator interface implementation takes into account this such of things?

The compiler doesn't generate the interface implementation; a person does.

Servy
  • 202,030
  • 26
  • 332
  • 449
  • When we implement a method that return an Enumerable using the yield keyword the compiler create a class that implements the IEnumerator interface for us. What you mean with: "The compiler doesn't generate the interface implementation; a person does" – Eric Javier Hernandez Saura Jan 29 '13 at 18:15
  • @EricJavierHernandezSaura A person is creating the iterator block, and based on the definition of that iterator block the iterator may or may not support mutation during iteration. The compiler can't just "do magic" to make it work, a person needs to. The iterator doesn't come out of thin air, the compiler is simply taking code that you are providing and transforming it into something else. It's not puling something from nothing. – Servy Jan 29 '13 at 18:17
  • @Sevy Ok, It is very logic. I just going to accept your answer. Thanks – Eric Javier Hernandez Saura Jan 29 '13 at 18:20
4

One big reason for not allowing modification of a collection while iterating on it is that, if elements in the collection are deleted or if new elements are inserted, it will throw the iteration off. (An element was inserted or deleted right where the iteration is working in the collection; what's the next element now? What's the new stopping condition?)

RobH
  • 1,607
  • 2
  • 24
  • 44
  • +1 - if explanation of "iterator.Next" takes more than one line of text (to cover all cases like "current element is deleted from collection") it will confuse 99% of developers. – Alexei Levenkov Jan 29 '13 at 17:13
1

One reason is thread safety. There is no way to guarantee that the iterator is reading from the backing array of a List<T> in a correct manner if another thread is adding to the list, which may potentially cause a reallocation to a new array.

It's worth noting that even enumerating a List<T> using a for loop exhibits this lack of thread safety.

From this blog post by JaredPar in which he creates a ThreadSafeList<T> class:

The collection no longer implements IEnumerable. IEnumerable is only valid when a collection is not changing under the hood. There is no way to easily make this guarantee with a collection built this way and hence it was removed.

It's worth mentioning that not all implementations of IEnumerable disallow modification during enumeration. The concurrent collections do, since they make thread-safe guarantees.

Drew Noakes
  • 300,895
  • 165
  • 679
  • 742
  • +1 for link to good discussion on behavior of IEnumerable... Note that while thread safety is interesting I think the question is other way around - why such behavior is desired. – Alexei Levenkov Jan 29 '13 at 17:44
  • 1
    This doesn't explain why it's an issue in a single threaded app. For example, why prevent people from writing: `foreach(var item in collection) collection.Remove(item);` (Assume there are no other threads.) – Servy Jan 29 '13 at 17:50
  • hmm if mutating `List` is not thread safe can doing something (enumerating) make it more not thread safe? – Conrad Frix Jan 29 '13 at 21:29
0

use the yield statement to load-up the elements you want to modify and do so after-the-fact

if you must modify a collection while iterating (if it can be indexed), use a for loop and disassociate the object with the loop declaration...but you want to make sure you use the lock statement around the loop to make sure you are the only one manipulating the object...and that you keep in mind your own manipulations on the next pass of the loop...

andrew
  • 91
  • 3
0

Maybe you could do that, but it would be unexpected behavior, outside the intention of the IEnumerable and IEnumerator interfaces.

IEnumerable.GetEnumerator

An enumerator remains valid as long as the collection remains unchanged. If changes are made to the collection, such as adding, modifying, or deleting elements, the enumerator is irrecoverably invalidated and its behavior is undefined.

This avoids problems with collections like LinkedList for example. Imagine you have a linked list with 4 nodes, and you iterate to the second node. Then the linked list is changed, where the second node is moved to the head of the linked list, and the third node is moved to the tail. What would it even mean to do next with your enumerator at that point? The possible behavior would be ambiguous and not easily guessed. When you are dealing with an object through its interface, you should not have to consider what the underlying class is, and whether that class and its enumerator are or are not tolerant of modification. The interface says modification invalidates the enumerator, so that should be how things behave.