22

What is the advantage of using an generator(yield) inside an __iter__() function? After reading through Python Cookbook I understand "If you want a generator to expose extra state to the user, don’t forget that you can easily implement it as a class, putting the generator function code in the __iter__() method."

import io

class playyield:
    def __init__(self,fp):
        self.completefp = fp

    def __iter__(self):
        for line in self.completefp:
            if 'python' in line:
                yield line

if __name__ =='__main__':
    with io.open(r'K:\Data\somefile.txt','r') as fp:
        playyieldobj = playyield(fp)
        for i in playyieldobj:
            print I

Questions:

  1. What does extra state means here?
  2. What is the advantage of using yield inside __iter__ () instead of using a separate function for yield?
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
Joe_12345
  • 589
  • 2
  • 7
  • 19
  • 2
    Because now `playyield` is *iterable*, but you didn't have to write an *iterator* class, since now `playyied.__iter__` returns a generator, which is an iterator. It is quite convenient. – juanpa.arrivillaga Aug 15 '17 at 02:16
  • Your question is not providing entire context. The quote you mentioned from the Python Cookbook is with respect to the problem `**Problem:** You would like to define a generator function, but it involves extra state that you would like to expose to the user somehow`. The _extra state_ here implies other information related to other parts of your program. – Ketan Mukadam Aug 15 '17 at 07:59
  • I also recommend to check this [blog post](http://nvie.com/posts/iterators-vs-generators/) – Ketan Mukadam Aug 15 '17 at 08:01

1 Answers1

28

Without generator functions, you would have to implement something like this, if you want to follow best practices:

In [7]: class IterableContainer:
   ...:     def __init__(self, data=(1,2,3,4,5)):
   ...:         self.data = data
   ...:     def __iter__(self):
   ...:         return IterableContainerIterator(self.data)
   ...:

In [8]: class IterableContainerIterator:
   ...:     def __init__(self, data):
   ...:         self.data = data
   ...:         self._pos = 0
   ...:     def __iter__(self):
   ...:         return self
   ...:     def __next__(self):
   ...:         try:
   ...:              item = self.data[self._pos]
   ...:         except IndexError:
   ...:             raise StopIteration
   ...:         self._pos += 1
   ...:         return item
   ...:

In [9]: container = IterableContainer()

In [10]: for x in container:
    ...:     print(x)
    ...:
1
2
3
4
5

Of course, the above example is contrived, but hopefully you get the point. With generators, this can simply be:

In [11]: class IterableContainer:
    ...:     def __init__(self, data=(1,2,3,4,5)):
    ...:         self.data = data
    ...:     def __iter__(self):
    ...:         for x in self.data:
    ...:             yield x
    ...:
    ...:

In [12]: list(IterableContainer())
Out[12]: [1, 2, 3, 4, 5]

As for state, well, it's exactly that - objects can have state, e.g. attributes. You can manipulate that state at runtime. You could do something like the following, although, I would say it is highly inadvisable:

In [19]: class IterableContainerIterator:
    ...:     def __init__(self, data):
    ...:         self.data = data
    ...:         self._pos = 0
    ...:     def __iter__(self):
    ...:         return self
    ...:     def __next__(self):
    ...:         try:
    ...:              item = self.data[self._pos]
    ...:         except IndexError:
    ...:             raise StopIteration
    ...:         self._pos += 1
    ...:         return item
    ...:     def rewind(self):
    ...:         self._pos = min(0, self._pos - 1)
    ...:

In [20]: class IterableContainer:
    ...:     def __init__(self, data=(1,2,3,4,5)):
    ...:         self.data = data
    ...:     def __iter__(self):
    ...:         return IterableContainerIterator(self.data)
    ...:

In [21]: container = IterableContainer()

In [22]: it = iter(container)

In [23]: next(it)
Out[23]: 1

In [24]: next(it)
Out[24]: 2

In [25]: it.rewind()

In [26]: next(it)
Out[26]: 1

In [27]: next(it)
Out[27]: 2

In [28]: next(it)
Out[28]: 3

In [29]: next(it)
Out[29]: 4

In [30]: next(it)
Out[30]: 5

In [31]: it.rewind()

In [32]: next(it)
Out[32]: 1
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • 1
    Over a year old, but why not just include `__next__` in the `IterableContainer` class? Simply set `def __iter__(self): return self` with the `__next__` as part of the class you wish to iterate, and now its an iterable is it not? – pstatix Feb 13 '19 at 11:28
  • 1
    @pstatix because that turns it into an *iterator*. All *iterators* are *iterable*, but not all *iterables* are *iterators*. Ideally, you should separate the two. For example, iterators should be *single pass*, but generally, you want to be able to iterate over most containers multiple times. – juanpa.arrivillaga Feb 13 '19 at 17:02
  • I suppose where I'm lost on semantics is that the generator automatically provides the `__next__`, so while not explicitly defined as part of the class, a generator is still an iterator, so it would reason using the generator or an `__iter__` and `__next__` combination makes the class an iterator either way. – pstatix Feb 14 '19 at 00:51
  • This [answer](https://stackoverflow.com/a/1960330/6741482) speaks to what I am getting at, are we just talking best practices here for separation of concerns? As I take it, a generator is an iterator, so picking either method is really based upon how you want to use the class. But I do see that as a generator, its iterable only once a time, whereas implementing as an iterator allows more maneuvers. – pstatix Feb 14 '19 at 00:56
  • @pstatix separation of concerns is a plus. A *generator object* is an iterator. A generator function *returns a generator object*. So, since the container's `__iter__` returns an iterator, it is now an iterable. But we don't want the *container* to be an *iterator* and implement `__next__`. I'm not sure what you mean by "more maneuvers". – juanpa.arrivillaga Feb 14 '19 at 01:02
  • @pstatix so just ask yourself this, why do none of the built-in containers implement `__next__`? (or choose a container from your favorite library, e.g. `numpy` or `pandas`) – juanpa.arrivillaga Feb 14 '19 at 01:19
  • Okay, I can understand wanting the container not to be an iterator. By "more maneuvers" I meant implementations like your `rewind()` method and such. As for the built-ins, I have no idea, enlighten me? – pstatix Feb 14 '19 at 01:37
  • @pstatix because *containers should be iterable, but they shouldn't be iterators*, because you want to be able to iterate over them many times, and you want to be able to make multiple iterators from one container. – juanpa.arrivillaga Feb 14 '19 at 01:49