3

I come from the world of Python and am trying to create a "generator" method in C#. I'm parsing a file in chunks of a specific buffer size, and only want to read and store the next chunk one at a time and yield it in a foreach loop. Here's what I have so far (simplified proof of concept):

class Page
{
    public uint StartOffset { get; set; }
    private uint currentOffset = 0;

    public Page(MyClass c, uint pageNumber)
    {
        uint StartOffset = pageNumber * c.myPageSize;

        if (StartOffset < c.myLength)
            currentOffset = StartOffset;
        else
            throw new ArgumentOutOfRangeException("Page offset exceeds end of file");

        while (currentOffset < c.myLength && currentOffset < (StartOffset + c.myPageSize))
            // read data from page and populate members (not shown for MWE purposes)
            . . .
    }
}

class MyClass
{
    public uint myLength { get; set; }
    public uint myPageSize { get; set; }

    public IEnumerator<Page> GetEnumerator()
    {
        for (uint i = 1; i < this.myLength; i++)
        {
            // start count at 1 to skip first page
            Page p = new Page(this, i);
            try
            {
                yield return p;
            }
            catch (ArgumentOutOfRangeException)
            {
                // end of available pages, how to signal calling foreach loop?
            }
        }
    }
}

I know this is not perfect since it is a minimum working example (I don't allow many of these properties to be set publicly, but for keeping this simple I don't want to type private members and properties).

However, my main question is how do I let the caller looping over MyClass with a foreach statement know that there are no more items left to loop through? Is there an exception I throw to indicate there are no elements left?

Dan
  • 4,488
  • 5
  • 48
  • 75
  • You simply stop yielding items, just like in Python. That being said, you should make a method that returns an `IEnumerable`; enumerables are easier to consume. – poke Jul 08 '16 at 20:19
  • 1
    `IEnumerator.MoveNext` is what tells the caller to stop iterating. This is implemented for you when you use `yield return`. If you wish to explicitly stop you can use `yield break`. – Mike Zboray Jul 08 '16 at 20:19
  • @poke the inconsistency is my fault in the example. Page is a made-up thing for this post, BTreePage is really what I'm returning in my real code. Fixed. – Dan Jul 08 '16 at 20:23
  • I was commenting about using `IEnumerable` vs. `IEnumerator`. – poke Jul 08 '16 at 20:23
  • @poke I missed the distinction, got a link to explain more or show an example? – Dan Jul 08 '16 at 20:25
  • 2
    @Dan See [this question](http://stackoverflow.com/q/558304/216074). Basically (since you’re coming from Python), `IEnumerable` is the generator, or the list, and `IEnumerator` would be the thing you get when you call `iter()` on it (the technical thing). Most of the time, you would want the `IEnumerable` since that’s easier to consume (e.g. using a `foreach` loop). – poke Jul 08 '16 at 20:27
  • @poke its confusing, because if i implement it in the class, it requires two getenumerator methods, one generic <> and one not. makes no sense but errors if I don't have both – Dan Jul 08 '16 at 20:39
  • with accepted answer can't i just do `foreach (Page p in myClass)`? – Dan Jul 08 '16 at 20:40
  • No, don’t implement it. Just make your method *return* that type. I.e. instead of `IEnumerator GetEnumerator()`, change the signature to `IEnumerable GetPages()` (use a proper name too). The implementation is the same. – poke Jul 08 '16 at 20:41
  • No, for the `foreach` to work, you need to return `IEnumerable` instead of `IEnumerator`. – poke Jul 08 '16 at 20:42
  • @poke crap. I gotta figure this out. But that's another question I suppose – Dan Jul 08 '16 at 20:46

3 Answers3

5

As mentioned in the comments, you should use IEnumerable<T> instead of IEnumerator<T>. The enumerator is the technical object that is being used to enumerate over something. That something—in many cases–is an enumerable.

C# has special abilities to deal with enumerables. Most prominently, you can use a foreach loop with an enumerable (but not an enumerator; even though the loop actually uses the enumerator of the enumerable). Also, enumerables allow you to use LINQ which makes it even more easier to consume.

So you should change your class like this:

class MyClass
{
    public uint myLength { get; set; }
    public uint myPageSize { get; set; }

    # note the modified signature
    public IEnumerable<Page> GetPages()
    {
        for (uint i = 1; i < this.myLength; i++)
        {
            Page p;
            try
            {
                p = new Page(this, i);
            }
            catch (ArgumentOutOfRangeException)
            {
                yield break;
            }
            yield return p;
        }
    }
}

In the end, this allows you to use it like this:

var obj = new MyClass();

foreach (var page in obj.GetPages())
{
    // do whatever
}

// or even using LINQ
var pageOffsets = obj.GetPages().Select(p => p.currentOffset).ToList();

Of course, you should also change the name of the method to something meaningful. If you’re returning pages, GetPages is maybe a good first step in the right direction. The name GetEnumerator is kind of reserved for types implementing IEnumerable, where the GetEnumerator method is supposed to return an enumerator of the collection the object represents.

poke
  • 369,085
  • 72
  • 557
  • 602
1

The two ways to do it is let the code execution reach the end of the GetEnumerator function or put in a yield break; in the code, this would behave the same as a return; in a function that returned void.

From the caller's perceptive the Enumerator returned from GetEnumerator() will start returning false for MoveNext(), that is how they tell that the enumerator is done.


To fix your "Can't yield a value inside the body of a try block with a catch clause" you put the try/catch around the wrong part of the code, the execption will be thrown on the new not the yield return. Your code should look like

public IEnumerator<Page> GetEnumerator()
{
    for (uint i = 1; i < this.myLength; i++)
    {
        // start count at 1 to skip first page
        Page p;
        try
        {
            p = new Page(this, i);
        }
        catch (ArgumentOutOfRangeException)
        {
            yield break;
        }
        yield return p;
    }
}
Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
  • this is right answer, but now I have two problems ;) -- apparently I can't use yield inside a try statement. Arg – Dan Jul 08 '16 at 20:27
  • Odd. I get "Can't yield a value inside the body of a try block with a catch clause" – Dan Jul 08 '16 at 20:29
  • Yes, [you can’t do that](http://stackoverflow.com/q/346365/216074). Just capture the value in a variable, and yield it after the try/catch. – poke Jul 08 '16 at 20:31
  • @Dan updated answer with a full example. You should had the try around the `new` anyway, that is where the exception would happen. – Scott Chamberlain Jul 08 '16 at 20:35
  • awesome, so I can just do `foreach (Page in myClass)` is my goal. Thank you – Dan Jul 08 '16 at 20:38
  • 2
    @Dan One note, using a `ArgumentOutOfRangeException` is kinda a bad design choice. You should not be using exceptions for your normal program flow control. A better design choice would be do the range math inside `GetEnumerator` and only call `new Page` if it is a valid range value. – Scott Chamberlain Jul 08 '16 at 20:41
0

Use the yield break; statement to end the sequence that your iterator method is generating.

Tanner Swett
  • 3,241
  • 1
  • 26
  • 32