6

I am trying to implement an iterable proxy for a web resource (lazily fetched images).

Firstly, I did (returning ids, in production those will be image buffers)

def iter(ids=[1,2,3]):
    for id in ids:
        yield id

and that worked nicely, but now I need to keep state.

I read the four ways to define iterators. I judged that the iterator protocol is the way to go. Follow my attempt and failure to implement that.

class Test:
    def __init__(self, ids):
         self.ids = ids
    def __iter__(self):
        return self
    def __next__(self):
        for id in self.ids:
            yield id
        raise StopIteration

test = Test([1,2,3])
for t in test:
    print('new value', t)

Output:

new value <generator object Test.__next__ at 0x7f9c46ed1750>
new value <generator object Test.__next__ at 0x7f9c46ed1660>
new value <generator object Test.__next__ at 0x7f9c46ed1750>
new value <generator object Test.__next__ at 0x7f9c46ed1660>
new value <generator object Test.__next__ at 0x7f9c46ed1750>

forever.

What's wrong?


Thanks to absolutely everyone! It's all new to me, but I'm learning new cool stuff.

vdmit
  • 45
  • 5
Vorac
  • 8,726
  • 11
  • 58
  • 101
  • 11
    Please don't use `me` where the rest of the world uses `self`. Breaking conventions for the sake of being different will only serve to hinder collaboration with others, including getting help on Stack Overflow. – Martijn Pieters May 21 '19 at 11:15
  • 4
    `__next__` is supposed to **return one value** from the iterator, and each time it is called it should return a different value. You have it yielding **every value**, which means it returns a generator. – khelwood May 21 '19 at 11:16
  • Wouldn't it be much easier to just subclass `list`? – DeepSpace May 21 '19 at 11:17
  • 1
    Another side note: there is a built-in `iter()` function, your custom `iter()` function shadows it and probably will lead to some confusion for people that expected to find the original. The implementation of `iter()` could be replaced by `iter([1, 2, 3])` (where `iter()` is the built-in function). – Martijn Pieters May 21 '19 at 12:10
  • 1
    You are creating iterable **instances** no iterable **class**. See [Iterating over object instances](https://stackoverflow.com/a/32362984/2556118) for iterable class in Python 3. – Hans Ginzel Feb 12 '21 at 10:57

4 Answers4

16

Your __next__ method uses yield, which makes it a generator function. Generator functions return a new iterator when called.

But the __next__ method is part of the iterator interface. It should not itself be an iterator. __next__ should return the next value, not something that returns all values(*).

Because you wanted to create an iterable, you can just make __iter__ the generator here:

class Test:
    def __init__(self, ids):
         self.ids = ids
    def __iter__(self):
        for id in self.ids:
            yield id

Note that a generator function should not use raise StopIteration, just returning from the function does that for you.

The above class is an iterable. Iterables only have an __iter__ method, and no __next__ method. Iterables produce an iterator when __iter__ is called:

Iterable -> (call __iter__) -> Iterator

In the above example, because Test.__iter__ is a generator function, it creates a new object each time we call it:

>>> test = Test([1,2,3])
>>> test.__iter__()  # create an iterator
<generator object Test.__iter__ at 0x111e85660>
>>> test.__iter__()
<generator object Test.__iter__ at 0x111e85740>

A generator object is a specific kind of iterator, one created by calling a generator function, or by using a generator expression. Note that the hex values in the representations differ, two different objects were created for the two calls. This is by design! Iterables produce iterators, and can create more at will. This lets you loop over them independently:

>>> test_it1 = test.__iter__()
>>> test_it1.__next__()
1
>>> test_it2 = test.__iter__()
>>> test_it2.__next__()
1
>>> test_it1.__next__()
2

Note that I called __next__() on the object returned by test.__iter__(), the iterator, not on test itself, which doesn't have that method because it is only an iterable, not an iterator.

Iterators also have an __iter__ method, which always must return self, because they are their own iterators. It is the __next__ method that makes them an iterator, and the job of __next__ is to be called repeatedly, until it raises StopIteration. Until StopIteration is raised, each call should return the next value. Once an iterator is done (has raised StopIteration), it is meant to then always raise StopIteration. Iterators can only be used once, unless they are infinite (never raise StopIteration and just keep producing values each time __next__ is called).

So this is an iterator:

class IteratorTest:
    def __init__(self, ids):
        self.ids = ids
        self.nextpos = 0
    def __iter__(self):
        return self
    def __next__(self):
        if self.ids is None or self.nextpos >= len(self.ids):
            # we are done
            self.ids = None
            raise StopIteration
        value = self.ids[self.nextpos]
        self.nextpos += 1
        return value

This has to do a bit more work; it has to keep track of what the next value to produce would be, and if we have raised StopIteration yet. Other answerers here have used what appear to be simpler ways, but those actually involve letting something else do all the hard work. When you use iter(self.ids) or (i for i in ids) you are creating a different iterator to delegate __next__ calls to. That's cheating a bit, hiding the state of the iterator inside ready-made standard library objects.

You don't usually see anything calling __iter__ or __next__ in Python code, because those two methods are just the hooks that you can implement in your Python classes; if you were to implement an iterator in the C API then the hook names are slightly different. Instead, you either use the iter() and next() functions, or just use the object in syntax or a function call that accepts an iterable.

The for loop is such syntax. When you use a for loop, Python uses the (moral equivalent) of calling __iter__() on the object, then __next__() on the resulting iterator object to get each value. You can see this if you disassemble the Python bytecode:

>>> from dis import dis
>>> dis("for t in test: pass")
  1           0 LOAD_NAME                0 (test)
              2 GET_ITER
        >>    4 FOR_ITER                 4 (to 10)
              6 STORE_NAME               1 (t)
              8 JUMP_ABSOLUTE            4
        >>   10 LOAD_CONST               0 (None)
             12 RETURN_VALUE

The GET_ITER opcode at position 2 calls test.__iter__(), and FOR_ITER uses __next__ on the resulting iterator to keep looping (executing STORE_NAME to set t to the next value, then jumping back to position 4), until StopIteration is raised. Once that happens, it'll jump to position 10 to end the loop.

If you want to play more with the difference between iterators and iterables, take a look at the Python standard types and see what happens when you use iter() and next() on them. Like lists or tuples:

>>> foo = (42, 81, 17, 111)
>>> next(foo)  # foo is a tuple, not an iterator
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object is not an iterator
>>> t_it = iter(foo)  # so use iter() to create one from the tuple
>>> t_it   # here is an iterator object for our foo tuple
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it)  # it returns itself
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it) is t_it  # really, it returns itself, not a new object
True
>>> next(t_it)  # we can get values from it, one by one
42
>>> next(t_it)  # another one
81
>>> next(t_it)  # yet another one
17
>>> next(t_it)  # this is getting boring..
111
>>> next(t_it)  # and now we are done
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> next(t_it)  # an *stay* done
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> foo  # but foo itself is still there
(42, 81, 17, 111)

You could make Test, the iterable, return a custom iterator class instance too (and not cop out by having generator function create the iterator for us):

class Test:
    def __init__(self, ids):
        self.ids = ids
    def __iter__(self):
        return TestIterator(self)

class TestIterator:
    def __init__(self, test):
        self.test = test
    def __iter__(self):
        return self
def __next__(self):
    if self.test is None or self.nextpos >= len(self.test.ids):
        # we are done
        self.test = None
        raise StopIteration
    value = self.test.ids[self.nextpos]
    self.nextpos += 1
    return value

That's a lot like the original IteratorTest class above, but TestIterator keeps a reference to the Test instance. That's really how tuple_iterator works too.

A brief, final note on naming conventions here: I am sticking with using self for the first argument to methods, so the bound instance. Using different names for that argument only serves to make it harder to talk about your code with other, experienced Python developers. Don't use me, however cute or short it may seem.


(*) Unless your goal was to create an iterator of iterators, of course (which is basically what the itertools.groupby() iterator does, it is an iterator producing (object, group_iterator) tuples, but I digress).

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
1

It is unclear to me exactly what you are trying to achieve, but if you really want to use your instance attributes like this, you can convert the input to a generator and then iterate it as such. But, as I said, this feels odd and I don't think you'd actually want a setup like this.

class Test:
    def __init__(self, ids):
         self.ids = iter(ids)
    def __iter__(self):
        return self
    def __next__(self):
        return next(self.ids)

test = Test([1,2,3])
for t in test:
    print('new value', t)
Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239
  • 1
    I specifically stepped away from storing an iterator in the instance; `(i for i in ids)` creates an iterator via a generator expression here. You could just have used `self.ids = iter(ids)` instead. – Martijn Pieters May 21 '19 at 11:26
  • That's still delegating the iteration to a stored iterator. That's fine, but then just use `test = iter([1, 2, 3])` and dispense with the thin wrapper that adds nothing. :-) – Martijn Pieters May 21 '19 at 12:04
  • 1
    This is exactly what confused me about OP's question as I didn't understand what their goal was or how they wanted to use this. I'll leave the answer as-is, with the note that this is just for illustrative purposes. It does what is asked, but is in fact completely useless. – Bram Vanroy May 21 '19 at 12:12
1

The simplest solution is to use __iter__ and return an iterator to the main list:

class Test:
    def __init__(self, ids):
         self.ids = ids
    def __iter__(self):
        return iter(self.ids)

test = Test([1,2,3])
for t in test:
    print('new value', t)

As the update, for lazily loading you can return an iterator to a generator:

    def __iter__(self):
        return iter(load_file(id) for id in self.ids)
Netwave
  • 40,134
  • 6
  • 50
  • 93
0

The __next__ function is supposed to return the next value provided by an iterator. Since you have used yield in your implementation, the function returns a generator, which is what you get.

You need to make clear whether you want Test to be an iterable or an iterator. If it is an iterable, it will have the ability to provide an iterator with __iter__. If it is an iterator, it will have the ability to provide new elements with __next__. Iterators can typically work as iterables by returning themselves in __iter__. Martijn's answer shows what you probably want. However, if you want an example of how you could specifically implement __next__ (by making Test explicitly an iterator), it could be something like this:

class Test:
    def __init__(self, ids):
        self.ids = ids
        self.idx = 0
    def __iter__(self):
        return self
    def __next__(self):
        if self.idx >= len(self.ids):
            raise StopIteration
        else:
            self.idx += 1
            return self.ids[self.idx - 1]

test = Test([1,2,3])
for t in test:
    print('new value', t)
jdehesa
  • 58,456
  • 7
  • 77
  • 121