11

I ran into a weird issue and I'm wondering what I should do about it.

I have this class that return a IEnumerable<MyClass> and it is a deferred execution. Right now, there are two possible consumers. One of them sorts the result.

See the following example :

public class SomeClass
{
    public IEnumerable<MyClass> GetMyStuff(Param givenParam)
    {
        double culmulativeSum = 0;
        return myStuff.Where(...)
                      .OrderBy(...)
                      .TakeWhile( o => 
                      {
                          bool returnValue = culmulativeSum  < givenParam.Maximum;
                          culmulativeSum += o.SomeNumericValue;
                          return returnValue; 
                      };
    }
}

Consumers call the deferred execution only once, but if they were to call it more than that, the result would be wrong as the culmulativeSum wouldn't be reset. I've found the issue by inadvertence with unit testing.

The easiest way for me to fix the issue would be to just add .ToArray() and get rid of the deferred execution at the cost of a little bit of overhead.

I could also add unit test in consumers class to ensure they call it only once, but that wouldn't prevent any new consumer coded in the future from this potential issue.

Another thing that came to my mind was to make subsequent execution throw. Something like

return myStuff.Where(...)
       .OrderBy(...)
       .TakeWhile(...)
       .ThrowIfExecutedMoreThan(1);

Obviously this doesn't exist. Would it be a good idea to implement such thing and how would you do it?

Otherwise, if there is a big pink elephant that I don't see, pointing it out will be appreciated. (I feel there is one because this question is about a very basic scenario :| )

EDIT :

Here is a bad consumer usage example :

public class ConsumerClass
{
    public void WhatEverMethod()
    {
        SomeClass some = new SomeClass();
        var stuffs = some.GetMyStuff(param);
        var nb = stuffs.Count(); //first deferred execution
        var firstOne = stuff.First(); //second deferred execution with the culmulativeSum not reset
    }
}
Community
  • 1
  • 1
AXMIM
  • 2,424
  • 1
  • 20
  • 38
  • Are you talking about a situation where multiple threads call this function at the same time? Otherwise I don't see how this could give you a bad result. culmulativeSum is being reset to zero at the top and it appears to be local to the function. – JuanR Nov 10 '16 at 21:56
  • 2
    That's basically why it's a bad idea to let LINQ methods have side effects. Although the code looks simple, what you are doing is not really a "basic scenario", and certainly not a recommended one. – vgru Nov 10 '16 at 22:02
  • 1
    @Juan: the `cumulativeSum` is set to `0` every time you call `GetMyStuff`, but NOT every time you enumerate the result. Because each time you enumerate you are only evaluating the LINQ part after the `return`. As a result each future time you enumerate you will get nothing back because `cumulativeSum` is already larger than the maximum. Demo: http://ideone.com/VgLbTe. – mellamokb Nov 10 '16 at 22:15

3 Answers3

11

You can solve the incorrect result issue by simply turning your method into iterator:

double culmulativeSum = 0;
var query = myStuff.Where(...)
       .OrderBy(...)
       .TakeWhile(...);
foreach (var item in query) yield return item;

It can be encapsulated in a simple extension method:

public static class Iterators
{
    public static IEnumerable<T> Lazy<T>(Func<IEnumerable<T>> source)
    {
        foreach (var item in source())
            yield return item;
    }
}

Then all you need to do in such scenarios is to surround the original method body with Iterators.Lazy call, e.g.:

return Iterators.Lazy(() =>
{
    double culmulativeSum = 0;
    return myStuff.Where(...)
           .OrderBy(...)
           .TakeWhile(...);
});
Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343
  • Impressive!, the whole 'GetMyStuff' become deferred. – AXMIM Nov 10 '16 at 22:15
  • In the exact scenario outlined in the question, this is a very fitting answer. But just keep in mind that the enumerable is still being evaluated multiple times. In many cases this would not be ideal. My answer below allows both lazy and cached evaluation. – Andrew Hanlon Nov 11 '16 at 04:11
  • 1
    @AndrewHanlon Lazy and cached evaluation are different concepts. Of course caching could be useful, but that's should be the consumer decision, not the implementor. If the query had no side effects, it would be evaluated multiple times anyway. Which I'm trying to keep and use lazy evaluation only to avoid the side effects of the implementation, which the consumer does not know. The consumer always can do `ToList`, `ToArray` or call method like yours if needed. The implementation itself does not require caching, hence should not do that. – Ivan Stoev Nov 11 '16 at 06:05
  • @IvanStoev Completely agree with your comment and solution. My point was simply that this does not prevent multiple enumeration (which was part of the question) and may be required in other implementations. Also, while obviously lazy and caching are different concepts, the .net `Lazy` type is lazy and a single evaluation, so the word is often understood to mean 'single evaluation' in .net. – Andrew Hanlon Nov 11 '16 at 06:14
  • 1
    @AndrewHanlon You are right for `Lazy`, but that's the defined and expected behavior, while for LINQ queries the standard expectations are deferred execution and reevaluation (although just seeing `IEnumerable>T>` one can't say if it's a collection or query, but anyway:). What about preventing multiple enumeration, neither mine nor your solution does that, I mean, the *returned* enumerable *can* be enumerated multiple times. Which is ok because the actual issue is the incorrect result, not the multiple enumeration. – Ivan Stoev Nov 11 '16 at 06:43
  • 1
    @AndrewHanlon I got your point, also the solution, my point was only that the two things must be separate, because your solution does unnecessary caching in case the consumer iterates the result just once. Anyway, good discussion, thanks for sharing your thoughts! – Ivan Stoev Nov 11 '16 at 06:45
  • @IvanStoev Cheers. I fleshed out my answer to show an example and use the proper 'Memoize' terminology instead of 'cache'. Interesting topic. – Andrew Hanlon Nov 11 '16 at 14:32
6

You can use the following class:

public class JustOnceOrElseEnumerable<T> : IEnumerable<T>
{
    private readonly IEnumerable<T> decorated;

    public JustOnceOrElseEnumerable(IEnumerable<T> decorated)
    {
        this.decorated = decorated;
    }

    private bool CalledAlready;

    public IEnumerator<T> GetEnumerator()
    {
        if (CalledAlready)
            throw new Exception("Enumerated already");

        CalledAlready = true;

        return decorated.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        if (CalledAlready)
            throw new Exception("Enumerated already");

        CalledAlready = true;

        return decorated.GetEnumerator();
    }
}

to decorate an enumerable so that it can only be enumerated once. After that it would throw an exception.

You can use this class like this:

return new JustOnceOrElseEnumerable(
   myStuff.Where(...)
   ...
   );

Please note that I do not recommend this approach because it violates the contract of the IEnumerable interface and thus the Liskov Substitution Principle. It is legal for consumers of this contract to assume that they can enumerate the enumerable as many times as they like.

Instead, you can use a cached enumerable that caches the result of enumeration. This ensures that the enumerable is only enumerated once and that all subsequent enumeration attempts would read from the cache. See this answer here for more information.

Community
  • 1
  • 1
Yacoub Massad
  • 27,509
  • 2
  • 36
  • 62
  • Good stuff and good recommendation. It's actually overkill for my scenario, but I might use this somewhere else along the way. – AXMIM Nov 10 '16 at 22:21
5

Ivan's answer is very fitting for the underlying issue in OP's example - but for the general case, I have approached this in the past using an extension method similar to the one below. This ensures that the Enumerable has a single evaluation but is also deferred:

public static IMemoizedEnumerable<T> Memoize<T>(this IEnumerable<T> source)
{
    return new MemoizedEnumerable<T>(source);
}

private class MemoizedEnumerable<T> : IMemoizedEnumerable<T>, IDisposable
{
    private readonly IEnumerator<T> _sourceEnumerator;
    private readonly List<T> _cache = new List<T>();

    public MemoizedEnumerable(IEnumerable<T> source)
    {
        _sourceEnumerator = source.GetEnumerator();
    }

    public IEnumerator<T> GetEnumerator()
    {
        return IsMaterialized ? _cache.GetEnumerator() : Enumerate();
    }

    private IEnumerator<T> Enumerate()
    {
        foreach (var value in _cache)
        {
            yield return value;
        }

        while (_sourceEnumerator.MoveNext())
        {
            _cache.Add(_sourceEnumerator.Current);
            yield return _sourceEnumerator.Current;
        }

        _sourceEnumerator.Dispose();
        IsMaterialized = true;
    }

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();

    public List<T> Materialize()
    {
        if (IsMaterialized)
            return _cache;

        while (_sourceEnumerator.MoveNext())
        {
            _cache.Add(_sourceEnumerator.Current);
        }

        _sourceEnumerator.Dispose();
        IsMaterialized = true;

        return _cache;
    }

    public bool IsMaterialized { get; private set; }

    void IDisposable.Dispose()
    {
        if(!IsMaterialized)
            _sourceEnumerator.Dispose();
    }
}

public interface IMemoizedEnumerable<T> : IEnumerable<T>
{
    List<T> Materialize();

    bool IsMaterialized { get; }
}

Example Usage:

void Consumer()
{
    //var results = GetValuesComplex();
    //var results = GetValuesComplex().ToList();
    var results = GetValuesComplex().Memoize();

    if(results.Any(i => i == 3)) 
    {
        Console.WriteLine("\nFirst Iteration");
        //return; //Potential for early exit.
    }

    var last = results.Last(); // Causes multiple enumeration in naive case.        

    Console.WriteLine("\nSecond Iteration");
}

IEnumerable<int> GetValuesComplex()
{
    for (int i = 0; i < 5; i++)
    {
        //... complex operations ...        
        Console.Write(i + ", ");
        yield return i;
    }
}
  • Naive: ✔ Deferred, ✘ Single enumeration.
  • ToList: ✘ Deferred, ✔ Single enumeration.
  • Memoize: ✔ Deferred, ✔ Single enumeration.

.

Edited to use the proper terminology and flesh out the implementation.

Andrew Hanlon
  • 7,271
  • 4
  • 33
  • 53