Your LINQ query does not return the requested data, it returns the possibility to get something that can access the elements of your data one by one.
In software terms: the value of your LINQ statement is an IEnumerable<T>
(or IQueryable<T>
not further discussed here). This object does not hold your data.
In fact, you can't do a lot with an IEnumerable<T>
. The only thing it can do is produce another object that implements IEnumerator<T>
. (note the difference: IEnumerable vs IEnumerator). This `GetEnumerator()' function is the "get something that can access ..." part in my first sentence.
The object you got from IEnumerable<T>.GetEnumerator()
, implements IEnumerator. This object also does not have to hold your data. It only knows how to produce the first element of your data (if there is one), and if it has got an element, it knows how to get the next element (if there is one). This is the "that can access the elements of your data one by one" from my first sentence.
So both the IEnumerable<T>
and the Enumerator<T>
do not (have to) hold your data. They are only objects that help you to access your data in a defined order.
In the early days, when we didn't have List<T>
or comparable collection classes that implemented IEnumerable<T>
it was quite a nuisance to implement the IEnumerable<T>
and the IEnumerator<T>
functions Reset
, Current
and MoveNext
. In fact, nowadays it is hard to find examples of implementing IEnumerator<T>
that do not use a class that also implements IEnumerator<T>
. Example
The introduction of the keyword Yield
eased the implementation of IEnumerable<T>
and IEnumerator<T>
a lot. If a function contains a Yield return
, it returns an IEnumerable<T>
:
IEnumerable<double> GetMySpecialNumbers()
{ // returns the sequence: 0, 1, pi and e
yield return 0.0;
yield return 1.0;
yield return 4.0 * Math.Atan(1.0);
yield return Math.Log(1.0)
}
Note that I use the term sequence. It is not a List, not a Dictionary, you can only access the elements by asking for the first one, and repeatedly ask for the next one.
You could access the elements of the sequence using IEnumerable<T>.GetEnumerator()
and the three functions of IEnumerator<T>
. This method is seldom used anymore:
IEnumerable<double> myNumbers = GetMySpecialNumbers();
IEnumerator<double> enumerator = myNumbers.GetEnumerator();
enumerator.Reset();
// while there are numbers, write the next one
while(enumerator.MoveNext())
{ // there is still an element in the sequence
double valueToWrite = enumerator.Current();
Console.WriteLine(valueToWrite);
}
With the introduction of foreach
this has become much easier:
foreach (double valueToWrite in GetMySpecialNumbers())
Console.WriteLine(valueToWrite);
Internally this will do the GetNumerator()
and the Reset()
/ MoveNext()
/ Current()
All generic collection classes like List, Array, Dictionary, HashTable, etc, implement IEnumerable. Most times that a function returns an IEnumerable, you'll find that internally it uses one of these collection classes.
Another great invention after yield
and foreach
was the introduction of extension methods. See extension methods demystified.
Extension methods enable you to take a class that you can't change, like List<T>
and write new functionality for it, using only the functions you have access to.
This was the boost for LINQ. It enabled us to write new functionality for everything that said: "hey, I'm a sequence, you can ask for my first element and for my next element" (= I implement IEnumerable).
If you look at the source code of LINQ, you'll find that LINQ functions like Where / Select / First / Reverse / ... etc, are written as Extension functions of IEnumerable. Most of them use generic collection classes (HashTable, Dictionary), some of them use yield return, and sometimes you'll even see the basic IEnumerator functions like Reset / MoveNext
Quite often you'll write new functionality by concatenating LINQ functions. However, keep in mind that sometimes yield
makes your function much easier to understand, and thus easier to reuse, debug and maintain.
Example: suppose you have a sequence of produced Products
. Each Product
has a DateTime
property ProductCompletedTime
that represents when its production of the product completed.
Suppose you want to know how much time there is between two completed products.
Problem: this can't be calculated for the first product.
With a yield this is easy:
public static IEnumerable<TimeSpan> ToProductionTimes<Product>
(this IEnumerable<Product> products)
{
var orderedProducts = product.OrderBy(product => product.ProductionTime;
Product previousProduct = orderedProducts.FirstOrDefault();
foreach (Product product in orderedProducts.Skip(1))
{
yield return product.ProductCompletedTime - previouseProduct.ProductCompletedTime;
previousProduct = product;
}
}
Try to do this in Linq, it will be much harder to understand what happens.
Conclusion
An IEnumerable does not hold your data, it only holds the potential to access your data one by one.
The most used methods to access the data are foreach, ToList(), ToDictionary, First, etc.
Whenever you need to write a function that returns a difficult IEnumerable<T>
at least consider writing a yield return
function.