3

I understand deferred execution in LINQ as explained here: What are the benefits of a Deferred Execution in LINQ?

However, deferred execution can lead to bugs especially in multithreaded scenarios eg. evaluation performed on a shared collection outside of a lock. When a method that returns a deferred evaluation is nested several method layers deep, it can get really hard (at least for me) to remember or keep track of such that I lock appropriately.

Omitting special cases like infinite sequences (eg. Fibonacci) and also assuming the filtering of the collection is deemed complete (ie. it is not likely that the consumer would filter the results further), what would be considered the "best approach" when returning a collection of IEnumerable from a method -- should it already be evaluated or deferred?

Note: "best approach" can be defined in terms of efficiency/code safety of some other measure, just verify in your response. I would like to know how the community does this.

Follow-on question: does it make sense to explicitly state in the method name if the result is evaluated or deferred?

Community
  • 1
  • 1
JohnC
  • 335
  • 3
  • 12

1 Answers1

2

Most of the code you'll write is not multi-threaded. In fact, there are only three reasons I can think of when you want to eagerly evaluate an enumerable:

  1. You need it in a multi-threaded environment. You should make it a list or array instead.
  2. You want random access to the enumerable. You should make it a list or array instead.
  3. You want to control when the evaluation takes place (for example if it is expensive).

At other times you should just let it use deferred execution. This postpones the evaluation to the point it is actually needed, and it might be faster depending on the filters you apply. For example, bigquery.First() might be faster than bigquery.ToArray().First(). Can you ever be sure that the user is done filtering?

Also, the runtime will optimize certain LINQ queries. This example is taken from Jon Skeet's article LINQ To Objects and the performance of nested "Where" calls:

// Normal LINQ query
var query = list.Where(x => Condition1(x))
                .Where(x => Condition2(x))
                .Select(x => Projection1(x))
                .Select(y => Projection2(y));

// After optimization
var query = list.WhereSelect(x => Condition1(x) && Condition2(x),
                             x => Projection2(Projection1(x)); 

By the way, your methods should return the most specific visible type they can. For example, a method dealing with T[] arrays or List<T> lists internally should generally not return only an IEnumerable<T>. If you want the result to be immutable, wrap it in a ReadOnlyCollection<T> instead.

Community
  • 1
  • 1
Daniel A.A. Pelsmaeker
  • 47,471
  • 20
  • 111
  • 157
  • When writing classes for maximum reuse in other projects, it is not known a priori if the consumer is operating in a multi-threaded scenario. A non-threadsafe class in a multithreaded environment is bad. So your point 1. would seem to indicate all methods should perform eager evaluation? – JohnC Mar 20 '13 at 16:34
  • @JohnC I'd argue that a consumer of an enumerable _knows_ that, by default, it is _not_ thread-safe. In most use cases that is not a problem. If it is a problem, it is the consumer's responsibility to ensure it is thread-safe. For example by calling `ToArray` on it. If the underlying collection is already an array, nothing happens. Otherwise, it is evaluated at that point. – Daniel A.A. Pelsmaeker Mar 20 '13 at 16:36
  • In reference to your example of bigquery.First() being faster than bigquery.ToArray().First(), note that I did state in my question "assuming the filtering of the collection is deemed complete (ie. it is not likely that the consumer would filter the results further)". In your example, the array was filtered further for the first element; not exactly the scenario I was asking for. – JohnC Mar 20 '13 at 16:36
  • Ok, I accept your argument of the consumer being responsible for ensuring thread safety. – JohnC Mar 20 '13 at 16:38