Performance regarding cached thread-safe IEnumerable implementation

Question

I created the ThreadSafeCachedEnumerable<T> class intending to increase performance where long running queries where being reused. The idea was to get an enumerator from an IEnumerable<T> and add items to a cache on each call to MoveNext(). The following is my current implementation:

/// <summary>
/// Wraps an IEnumerable&lt;T&gt; and provides a thread-safe means of caching the values."/>
/// </summary>
/// <typeparam name="T"></typeparam>
class ThreadSafeCachedEnumerable<T> : IEnumerable<T>
{
    // An enumerator from the original IEnumerable<T>
    private IEnumerator<T> enumerator;

    // The items we have already cached (from this.enumerator)
    private IList<T> cachedItems = new List<T>();

    public ThreadSafeCachedEnumerable(IEnumerable<T> enumerable)
    {
        this.enumerator = enumerable.GetEnumerator();
    }

    public IEnumerator<T> GetEnumerator()
    {
        // The index into the sequence
        int currentIndex = 0;

        // We will break with yield break 
        while (true)
        {
            // The currentIndex will never be decremented,
            // so we can check without locking first
            if (currentIndex < this.cachedItems.Count)
            {
                var current = this.cachedItems[currentIndex];
                currentIndex += 1;
                yield return current;
            }
            else
            {
                // If !(currentIndex < this.cachedItems.Count),
                // we need to synchronize access to this.enumerator
                lock (enumerator)
                {
                    // See if we have more cached items ...
                    if (currentIndex < this.cachedItems.Count)
                    {
                        var current = this.cachedItems[currentIndex];
                        currentIndex += 1;
                        yield return current;
                    }
                    else
                    {
                        // ... otherwise, we'll need to get the next item from this.enumerator.MoveNext()
                        if (this.enumerator.MoveNext())
                        {
                            // capture the current item and cache it, then increment the currentIndex
                            var current = this.enumerator.Current;
                            this.cachedItems.Add(current);
                            currentIndex += 1;
                            yield return current;
                        }
                        else
                        {
                            // We reached the end of the enumerator - we're done
                            yield break;
                        }
                    }
                }
            }
        }
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }
}

I simply lock (this.enumerator) when the no more items appear to be in the cache, just in case another thread is just about to add another item (I assume that calling MoveNext() on this.enumerator from two threads is a bad idea).

The performance is great when retrieving previously cached items, but it starts to suffer when getting many items for the first time (due to the constant locking). Any suggestions for increasing the performance?

Edit: The new Reactive Framework solves the problem outlined above, using the System.Linq.EnumerableEx.MemoizeAll() extension method.

Internally, MemoizeAll() uses a System.Linq.EnumerableEx.MemoizeAllEnumerable<T> (found in the System.Interactive assembly), which is similar to my ThreadSafeCachedEnumerable<T> (sorta).

Here's an awfully contrived example that prints the contents of an Enumerable (numbers 1-10) very slowly, then quickly prints the contents a second time (because it cached the values):

// Create an Enumerable<int> containing numbers 1-10, using Thread.Sleep() to simulate work
var slowEnum = EnumerableEx.Generate(1, currentNum => (currentNum <= 10), currentNum => currentNum, previousNum => { Thread.Sleep(250); return previousNum + 1; });

// This decorates the slow enumerable with one that will cache each value.
var cachedEnum = slowEnum.MemoizeAll();

// Print the numbers
foreach (var num in cachedEnum.Repeat(2))
{
    Console.WriteLine(num);
}

I don't see either `MemoizeAll` or `MemoizeAllEnumerable` in [Rx codebase](https://github.com/Reactive-Extensions/Rx.NET). That being said, your `ThreadSafeCachedEnumerable` forgot to call dispose() on the source enumerator, which is an IDisposable. This usually is the responsibility of `foreach` or Linq methods, but since now you obtain one, you should be responsible for disposing it. — KFL, Apr 02 '15 at 06:21
Related: [System.Interactive: Difference between Memoize() and MemoizeAll()](https://stackoverflow.com/questions/2931582/system-interactive-difference-between-memoize-and-memoizeall). The source code of the `Memoize` operator is [here](https://github.com/dotnet/reactive/blob/main/Ix.NET/Source/System.Interactive/System/Linq/Operators/Memoize.cs). — Theodor Zoulias, Aug 05 '21 at 06:41
Related: [Is there an IEnumerable implementation that only iterates over it's source (e.g. LINQ) once?](https://stackoverflow.com/questions/12427097/is-there-an-ienumerable-implementation-that-only-iterates-over-its-source-e-g) — Theodor Zoulias, Aug 16 '21 at 02:19

score 7 · Answer 1 · answered Jul 06 '09 at 16:05

7

A couple of recommendations:

It is now generally accepted practice not to make container classes responsible for locking. Someone calling your cached enumerator, for instance, might also want to prevent new entries from being added to the container while enumerating, which means that locking would occur twice. Therefore, it's best to defer that responsibility to the caller.
Your caching depends on the enumerator always returning items in-order, which is not guaranteed. It's better to use a Dictionary or HashSet. Similarly, items may be removed inbetween calls, invalidating the cache.
It is generally not recommended to establish locks on publically accessible objects. That includes the wrapped enumerator. Exceptions are conceivable, for example when you're absolutely certain you're absolutely certain you're the only instance holding a reference to the container class you're enumerating over. This would also largely invalidate my objections under #2.

answered Jul 06 '09 at 16:05

Michiel Buddingh

5,783
1
21
32

5

+1 that "It is now generally accepted practice not to make container classes responsible for locking". I wish everyone got that memo. – Frank Schwieterman Jul 06 '09 at 16:07
Those are nice recommendations - here are my thoughts: 1) That would add too much complexity for my purposes. I just want to cache the results of expensive LINQ projections, where I may only need a few of the results. A lazy loaded list, essentially, only without being able to access an element by index. I'd agree with you for regular containers, though. 2) For my purposes, that should be understood by the caller as a caveat. 3) I can't imagine an IEnumerable holding a reference to one of its IEnumerators, but I suppose it's a possibility. Thanks for the suggestions. – Charles Jul 06 '09 at 18:08
Generally accepted practice, eh? Does that explain the new `System.Collections.Concurrent` namespace in .NET 4.0? – Dan Tao Nov 20 '09 at 14:42
Haven't studied 4.0 in any detail, but on the face of it, that new namespace seems like a nice indication that `thread-safe' containers are the exception, not the rule. But you're probably right that 'generally accepted practice' is a bold expression. – Michiel Buddingh Nov 20 '09 at 20:50

Bradley Grainger · Answer 2 · 2009-07-06T16:35:53.287

3

Locking in .NET is normally very quick (if there is no contention). Has profiling identified locking as the source of the performance problem? How long does it take to call MoveNext on the underlying enumerator?

Additionally, the code as it stands is not thread-safe. You cannot safely call this.cachedItems[currentIndex] on one thread (in if (currentIndex < this.cachedItems.Count)) while invoking this.cachedItems.Add(current) on another. From the List(T) documentation: "A List(T) can support multiple readers concurrently, as long as the collection is not modified." To be thread-safe, you would need to protect all access to this.cachedItems with a lock (if there's any chance that one or more threads could modify it).

edited Jul 06 '09 at 16:35

answered Jul 06 '09 at 16:29

Bradley Grainger

27,458
4
91
108

That's a valid point, but do you know if an exception could be thrown? The idea behind not locking was that I could count on the list only getting larger and the index getting larger, so I could bypass the need for locking if (currentIndex < this.cachedItems.Count) - else, I could lock and double check it. I played around with profiling it using a Stopwatch - iterating through 20 million items the first time adds another 2 seconds (it's negligable I, suppose). – Charles Jul 06 '09 at 17:53
The main thing I would be concerned about is another thread resizing the List's internal array (in Add()) while the reader thread is using the indexer to retrieve an item. It seems possible that it could return default(T) or throw an ArgumentOutOfRangeException. Of course, this all depends on the exact implementation of List, so the best I could say is that the behaviour is "undefined". (Even if Reflector shows you that it would be safe, who knows if it could change in .NET 4.0, introducing a subtle and hard-to-find bug?) – Bradley Grainger Jul 07 '09 at 14:14

Performance regarding cached thread-safe IEnumerable implementation

2 Answers2