I would really appreciate if someone checked out my thoughts on this topic and corrected them where they are wrong. I hope that this type of "please check my thoughts" doesn't go too much against the rules of the site, and my question is more of an attempt at an answer on how a certain topic works from someone who doesn't really understand the topic. Hopefully it could be valuable/insightful to someone who is a beginner like me and might want to see a high-level but very basic description of what's going on.
Here's my approach to understanding the topic of IEnumerable
, IEnumerator
and yield return.
The way IEnumerable
and IEnumerator
interact is quite clear and natural - we have two objects, one that holds the collection, one that has the cursor.
For the moment, let's ignore the connection between yield return and the two interfaces, and just consider the following concept/problem - suppose we have code that has an expression/computation of some integer on each line, and we would like to create a function f
that executes and returns these integers in sequence, i.e. the first time we call f()
it will return the integer that is computed on the first line, the second time we call it it will return the second integer, etc. The natural way to do this is to create some sort of helper class/object that will contain the mentioned code, but in addition to that it will also store the position of where we are in the code. Using goto
this can be done. To achieve this, we designate a keyword of yield return to do this - the compiler recognizes that if we have some sort of block of code with yield returns inside, it will create a helper/controller class for it. Notice that how this helper class acts on the block of code is already very similar to how an IEnumerator
acts on IEnumerable
- it stores "where" we are in some sense.
Next we will use this concept and define it so that it agrees with the existing functionality of IEnumerator
and IEnumerable
interfaces. The basic functionality that we will be aiming for is that of foreach
- when used with an object that implements the IEnumerable
interface, it returns the cursor implemented in IEnumerator
via IEnumerable.GetEnumerator
and then uses methods of MoveNext
and current
to move along the collection and returns the object pointed to respectively.
So the first way we could use the concept is to only define the GetEnumerator
- we simply put the block of code that we want to be executed sequentially in the body of GetEnumerator
. For this to work, the compiler recognizes that if the body of the GetEnumerator
contains yield return
, it will create a helper object (and it will return this object to the call of GetEnumerator
in the end), and this helper class will also implement the methods MoveNext
and Current
as "execute the next line, and save to _current
(some field of the helper class) and "return _current
" respectively. Essentially, GetEnumerator
will return the helper object mentioned two paragraphs above, and in addition the compiler will create the needed MoveNext
and Current
methods in this helper object, so that they move along the block of code, return the value, respectively.
The other way we could use the concept is to directly "define" the collection and go through it. So this time, we're not just returning an IEnumerator
/ helper class, instead we're returning an IEnumerable
. But for our purposes, we're really just interested in being able to use this object with foreach
. So one way we can do this is to just do almost the same thing as above - we implement the helper object as an IEnumerable
as well as IEnumerator
- so we can return it as an IEnumerable
in some foreach
statement, and we implement the GetEnumerator
to just return this
. So the effect is very similar to the above.
Does the above make sense? One thing that I'm not clear on is whether for example foreach
could somehow be used with IEnumerator
directly - i.e. whether we could do (foreach element in Enumerator())
where Enumerator()
is a function that returns IEnumerator
only, and not IEnumerable
. It doesn't make sense, but I'm not sure whether the compiler can't figure these things out - especially since it's capable of determing to do such drastically different things if it sees yield return
in the body, it then creates a helper object etc, like I've mentioned, so I'm not sure just how much other syntactic sugar there could be besides this.