36

I always assumed that if I was using Select(x=> ...) in the context of LINQ to objects, then the new collection would be immediately created and remain static. I'm not quite sure WHY I assumed this, and its a very bad assumption but I did. I often use .ToList() elsewhere, but often not in this case.

This code demonstrates that even a simple 'Select' is subject to deferred execution :

var random = new Random();
var animals = new[] { "cat", "dog", "mouse" };
var randomNumberOfAnimals = animals.Select(x => Math.Floor(random.NextDouble() * 100) + " " + x + "s");

foreach (var i in randomNumberOfAnimals)
{
    testContextInstance.WriteLine("There are " + i);
}

foreach (var i in randomNumberOfAnimals)
{
    testContextInstance.WriteLine("And now, there are " + i);
}

This outputs the following (the random function is called every time the collection is iterated through):

There are 75 cats
There are 28 dogs
There are 62 mouses
And now, there are 78 cats
And now, there are 69 dogs
And now, there are 43 mouses

I have many places where I have an IEnumerable<T> as a member of a class. Often the results of a LINQ query are assigned to such an IEnumerable<T>. Normally for me, this does not cause issues, but I have recently found a few places in my code where it poses more than just a performance issue.

In trying to check for places where I had made this mistake I thought I could check to see if a particular IEnumerable<T> was of type IQueryable. This I thought would tell me if the collection was 'deferred' or not. It turns out that the enumerator created by the Select operator above is of type System.Linq.Enumerable+WhereSelectArrayIterator``[System.String,System.String] and not IQueryable.

I used Reflector to see what this interface inherited from, and it turns out not to inherit from anything that indicates it is 'LINQ' at all - so there is no way to test based upon the collection type.

I'm quite happy now putting .ToArray() everywhere now, but I'd like to have a mechanism to make sure this problem doesn't happen in future. Visual Studio seems to know how to do it because it gives a message about 'expanding the results view will evaluate the collection.'

The best I have come up with is :

bool deferred = !object.ReferenceEquals(randomNumberOfAnimals.First(),
                                        randomNumberOfAnimals.First());

Edit: This only works if a new object is created with 'Select' and it not a generic solution. I'm not recommended it in any case though! It was a little tongue in the cheek of a solution.

Simon_Weaver
  • 140,023
  • 84
  • 646
  • 689
  • 14
    Remember, a query expression gives you an object which represents *the query itself*. The object does not represent the RESULTS of the query, the object represents a query. Think of it as a SQL query string, only smarter. You ask the query for its results, and the query executes. You ask it again, the query executes again; no guarantees that the results are the same the second time you ask; the world might have changed since then. – Eric Lippert Jul 23 '09 at 05:28
  • The best I have come across is [here](http://stackoverflow.com/questions/4621561/is-there-a-way-to-memorize-or-materialize-an-ienumerable), not foolproof though.. ! – nawfal Sep 29 '13 at 06:49
  • Why do you care about deferred execution? You should pay no attention to that, and consider it to be a private implementation detail of your `IEnumerable`. – John Saunders Jul 23 '09 at 00:23
  • because I get different objects back each time and i was not expecting that. i was modifying a quantity on them and then that change was lost – Simon_Weaver Jul 23 '09 at 01:23
  • What's that have to do with deferred execution? I think you have mistaken the source of your problem. – John Saunders Jul 23 '09 at 01:46

6 Answers6

24

Deferred execution of LINQ has trapped a lot of people, you're not alone.

The approach I've taken to avoiding this problem is as follows:

Parameters to methods - use IEnumerable<T> unless there's a need for a more specific interface.

Local variables - usually at the point where I create the LINQ, so I'll know whether lazy evaluation is possible.

Class members - never use IEnumerable<T>, always use List<T>. And always make them private.

Properties - use IEnumerable<T>, and convert for storage in the setter.

public IEnumerable<Person> People 
{
    get { return people; }
    set { people = value.ToList(); }
}
private List<People> people;

While there are theoretical cases where this approach wouldn't work, I've not run into one yet, and I've been enthusiasticly using the LINQ extension methods since late Beta.

BTW: I'm curious why you use ToArray(); instead of ToList(); - to me, lists have a much nicer API, and there's (almost) no performance cost.

Update: A couple of commenters have rightly pointed out that arrays have a theoretical performance advantage, so I've amended my statement above to "... there's (almost) no performance cost."

Update 2: I wrote some code to do some micro-benchmarking of the difference in performance between Arrays and Lists. On my laptop, and in my specific benchmark, the difference is around 5ns (that's nanoseconds) per access. I guess there are cases where saving 5ns per loop would be worthwhile ... but I've never come across one. I had to hike my test up to 100 million iterations before the runtime became long enough to accurately measure.

Bevan
  • 43,618
  • 10
  • 81
  • 133
  • Bevan: I agree with most of your point - but the last sentence. There are performance costs to lists, they just are very, very minimal. I agree, in nearly all cases, it's negligible, but List IS (microscopically) slower than an array, so if you're doing high perf. code, there is a reason to use ToArray() sometimes. – Reed Copsey Jul 23 '09 at 00:33
  • @Bevan: One tiny case where arrays are neat is the baseline JIT can get similar performance from them as the optimizing JIT can get from `List`. There are direct IL instructions for working with the arrays - no separate method inlining required. – Sam Harwell Jul 23 '09 at 00:37
  • any specific reason not to use ICollection for class members, or IList. I'm leaning towards ICollection for class members – Simon_Weaver Jul 23 '09 at 01:35
  • @Reed Copsey - Perhaps I over simplified. If a system has such a pervasive need for performance that every microsecond counts, then using an array can be advantageous, assuming that the array is accessed a lot. However ... in most of the code I've seen, arrays are used when performance is not that great an issue and when the nicer List API would avoid the need to jump through hoops. – Bevan Jul 23 '09 at 04:42
  • 2
    @Simon - the List class has a rich and expressive API. I've found that having access through that API, rather than artificially limiting myself to IList or ICollection has been beneficial, especially when I can easily ensure that I always have a List to work with. – Bevan Jul 23 '09 at 04:44
  • 10
    In my opinion, performance is not the best reason to do something unless performance has been proven to be an issue. Using the right structure for the job is much more important than saving a few ms, especially for the majority of applications. The creator of stack overflow has blogged about this, search for coding horror and micro-optimization. Also, this plays into the concept of not prematurely optimizing. You may make assumptions regarding performance that restrict you. It's even conceivable that the optimizer undoes whatever you've done (as many people encounter in benchmarking). – Sprague Dec 14 '10 at 17:40
  • The biggest difference between `List` and `T[]` is that arrays are immutable and lists are not. When you use `List` as your member variable and I read your code, I'm going to assume that you meant to create a mutable collection which could mean items may be added, or removed by other methods of the class. Be mindful of the "body language" of your code, of what it expresses. – joshperry Dec 18 '12 at 20:13
  • @joshperry Arrays are not immutable, their contents can be changed. But I agree with what you are saying, surely this is what `ReadOnlyCollection` is for? – Lukazoid Mar 05 '13 at 11:01
  • @Lukazoid Yes, of course the items themselves aren't immutable which in a technical sense means the array itself isn't immutable. What I meant was that a `List` communicates something like "hey, this is somewhere that you can expect to add and remove items" where if you use an array or `IEnumerable` (or in 4.5 `IReadOnlyCollection`) says "yo, this is a collection of items that you're welcome to look at but it's not where you should be adding or removing items". Unfortunately even `ReadOnlyCollection` is only a facade over mutable data since C# doesn't support immutability. – joshperry Mar 05 '13 at 18:15
8

In general, I'd say you should try to avoid worrying about whether it's deferred.

There are advantages to the streaming execution nature of IEnumerable<T>. It is true - there are times that it's disadvantageous, but I'd recommend just always handling those (rare) times specifically - either go ToList() or ToArray() to convert it to a list or array as appropriate.

The rest of the time, it's better to just let it be deferred. Needing to frequently check this seems like a bigger design problem...

nawfal
  • 70,104
  • 56
  • 326
  • 368
Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • agreed. the question was more sparked by curiosity to know how to do this, and also for finding places where this issue occured. i definitely want to pretty much always do 'ToArray()'. i had just been getting away without it becasue a) it is slightly ugly to look at and b) i thought that a 'Select' projection was immediately realized. – Simon_Weaver Jul 23 '09 at 01:25
3

My five cents. Quite often you have to deal with an enumerable that you have no idea what's inside of it.

Your options are:

  • turn it to a list before using it but chances are it's endless you are in trouble
  • use it as is and you are likely to face all kinds of deferred execution funny things and you are in trouble again

Here is an example:

[TestClass]
public class BadExample
{
    public class Item
    {
        public String Value { get; set; }
    }
    public IEnumerable<Item> SomebodysElseMethodWeHaveNoControlOver()
    {
        var values = "at the end everything must be in upper".Split(' ');
        return values.Select(x => new Item { Value = x });
    }
    [TestMethod]
    public void Test()
    {
        var items = this.SomebodysElseMethodWeHaveNoControlOver();
        foreach (var item in items)
        {
            item.Value = item.Value.ToUpper();
        }
        var mustBeInUpper = String.Join(" ", items.Select(x => x.Value).ToArray());
        Trace.WriteLine(mustBeInUpper); // output is in lower: at the end everything must be in upper
        Assert.AreEqual("AT THE END EVERYTHING MUST BE IN UPPER", mustBeInUpper); // <== fails here
    }
}

So there is no way to get away with it but the one: iterate it exactly one time on as-you-go basis.

It was clearly a bad design choice to use the same IEnumerable interface for immediate and deferred execution scenarios. There must be a clear distinction between these two, so that it's clear from the name or by checking a property whether or not the enumerable is deferred.

A hint: In your code consider using IReadOnlyCollection<T> instead of the plain IEnumerable<T>, because in addition to that you get the Count property. This way you know for sure it's not endless and you can turn it to a list no problem.

Trident D'Gao
  • 18,973
  • 19
  • 95
  • 159
2

The message about expanding the results view will evaluate the collection is a standard message presented for all IEnumerable objects. I'm not sure that there is any foolproof means of checking if an IEnumerable is deferred, mainly because even a yield is deferred. The only means of absolutely ensuring that it isn't deferred is to accept an ICollection or IList<T>.

Adam Robinson
  • 182,639
  • 35
  • 285
  • 343
  • 1
    Even an `IList` could be deferred if it has a virtual `Item` getter. – Sam Harwell Jul 23 '09 at 00:25
  • Yeah, what the confused Chevy/Nissan crossbreed said. If you want to be sure, only accept a concrete array. – Jeffrey Hantin Jul 23 '09 at 00:27
  • ICollection seems to work best. while i was aware of it - i tend to forget that one. i didnt want a List or IList becasue of the overhead, but ICollection tells me if I forget to use ToArray() [ or someone using my software does ] – Simon_Weaver Jul 23 '09 at 01:14
1

It's absolutely possible to manually implement a lazy IEnumerator<T>, so there's no "perfectly general" way of doing it. What I keep in mind is this: if I'm changing things in a list while enumerating something related to it, always call ToArray() before the foreach.

Sam Harwell
  • 97,721
  • 20
  • 209
  • 280
1

This is an interesting reaction to deferred execution - most people view it as a positive in that it allows you to transform streams of data without needing to buffer everything up.

Your suggested test won't work, because there's no reason why an iterator method can't yield the same reference object instance as its first object on two successive tries.

IEnumerable<string> Names()
{
    yield return "Fred";
}

That will return the same static string object every time, as the only item in the sequence.

As you can't reliably detect the compiler-generated class that is returned from an iterator method, you'll have to do the opposite: check for a few well-known containers:

public static IEnumerable<T> ToNonDeferred(this IEnumerable<T> source)
{
    if (source is List<T> || source is T[]) // and any others you encounter
        return source;

    return source.ToArray();
}

By returning IEnumerable<T>, we keep the collection readonly, which is important because we may get back a copy or an original.

Daniel Earwicker
  • 114,894
  • 38
  • 205
  • 284
  • so is this how visual studio does it? – Simon_Weaver Jul 23 '09 at 01:21
  • 1
    Visual Studio shows the "expanding the results view will evaluate the collection" for _any_ enumerable that isn't `ICollection`. – Pavel Minaev Jul 23 '09 at 01:33
  • 1
    and i LOVE deferred execution. i'd just made a stupid assumption (and really not even a conscious one) that Select(x => x + "foo") would not be deferred. i absolutely understand why it is, but it had never resulted [surprisingly] in a bug in my code until today – Simon_Weaver Jul 23 '09 at 01:36
  • 1
    @Simon_Weaver deferred execution is cool, but it can be dangerous. what if you've placed a try block around a declaration expecting it to be executed, but it isn't executed until the following block? that's an unhandled exception. I've been caught by this issue in the past. It's a powerful feature, but to the uninitiated, it can cause issues. If you're using IEnumerable<> and var in your local blocks, you may not know what's actually being placed into them. Proper unit testing should identify these cases, but resolving them may be confusing. – Sprague Dec 14 '10 at 17:45
  • Better check against `ICollection` than `List` or `T[]`. See [this](http://stackoverflow.com/a/19067974/661933) – nawfal Sep 28 '13 at 14:53