5

Why have an __iter__ method? If an object is an iterator, then it is pointless to have a method which returns itself. If it is not an iterator but is instead an iterable, i.e something with an __iter__ and __getitem__ method, then why would one want to ever define something which returns an iterator but is not an iterator itself? In Python, when would one want to define an iterable that is not itself an iterator? Or, what is an example of something that is an iterable but not an iterator?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • 4
    `iterable` and `iterator` have specific meanings in python: https://stackoverflow.com/questions/9884132/what-exactly-are-pythons-iterator-iterable-and-iteration-protocols – wflynny Apr 17 '16 at 19:43
  • 3
    Many builtin containers (like lists, tuples, dictionaries) are iterable without being iterators. – Blckknght Apr 17 '16 at 19:44
  • 4
    What if you want to iterate over something multiple times simultaneously? – kindall Apr 17 '16 at 19:44
  • 2
    forget simultaneously, what if you want to iterate over a sequence more then once? A sequence can be iterated over (an iterable) but is not inherently an **iterator**. and iterator can only be used once. – Tadhg McDonald-Jensen Apr 17 '16 at 19:51
  • 1
    @TadhgMcDonald-Jensen you *could* write an iterator that can be used multiple times. – timgeb Apr 17 '16 at 19:57
  • @timgeb In my defense, this is not a duplicate. I am not asking about the basic definition but the usefulness of having objects that follow the definitions. – Kamil Michnicki Apr 17 '16 at 19:58
  • @KamilMichnicki I reconsidered and reopened your question. Your edits helped, too. – timgeb Apr 17 '16 at 20:03
  • @jonrsharpe I suggest reopening this. The question also asks why iterables delegate the work to iterator objects and I'm in the middle of writing an answer. – timgeb Apr 17 '16 at 20:15

3 Answers3

7

Trying to answer your questions one at a time:

Why have an __iter__ method? If an object is an iterator, then it is pointless to have a method which returns itself.

It's not pointless. The iterator protocol demands an __iter__ and __next__ (or next in Python 2) method. All sane iterators I have ever seen just return self in their __iter__ method, but it is still crucial to have that method. Not having it would lead to all kinds of weirdness, for example:

somelist = [1, 2, 3]
it = iter(somelist)

now

iter(it)

or

for x in it: pass

would throw a TypeError and complain that it is not iterable, because when iter(x) is called (which implicitly happens when you employ a for loop) it expects the argument object x to be able to produce an iterator (it just tries to call __iter__ on that object). Concrete example (Python 3):

>>> class A:
...     def __iter__(self):
...         return B()
...
>>> class B:
...     def __next__(self):
...         pass
...
>>> iter(iter(A()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'B' object is not iterable

Consider any functions, escpecially from itertools that expect an iterable, for example dropwhile. Calling it with any object that has an __iter__ method will be fine, regardless of whether it's an iterable that is not an iterator, or an iterator - because you can expect the same result when calling iter with that object as an argument. Making a weird distinction between two kinds of iterables here would go against the principle of duck typing which python strongly embraces.

Neat tricks like

>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> list(zip(*[iter(a)]*3))
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]

would just stop working if you could not pass iterators to zip.

why would one want to ever define something which returns an iterator but is not an iterator itself

Let's consider this simple list iterator:

>>> class MyList(list):
...     def __iter__(self):
...         return MyListIterator(self)
>>>
>>> class MyListIterator:
...     def __init__(self, lst):
...         self._lst = lst
...         self.index = 0
...     def __iter__(self):
...         return self
...     def __next__(self):
...         try:
...             n = self._lst[self.index]
...             self.index += 1
...             return n
...         except IndexError:
...             raise StopIteration
>>>    
>>> a = MyList([1,2,3])
>>> for x in a:
...     for x in a:
...         x
...
1
2
3
1
2
3
1
2
3

Remember that iter is called with the iterable in question for both for loops, expecting a fresh iterator each time from the object's __iter__ method.

Now, without an iterator being produced each time a for loop is employed, how would you be able to keep track of the current state of any iteration when a MyList object is iterated over an arbitrary number of times at the same time? Oh, that's right, you can't. :)

edit: Bonus and sort of a reply to Tadhg McDonald-Jensen's comment

A resuable iterator is not unthinkable, but of course a bit weird because it would rely on being initialized with a "non-consumable" iterable (i.e. not a classic iterator):

>>> class riter(object):
...     def __init__(self, iterable):
...         self.iterable = iterable
...         self.it = iter(iterable)
...     def __next__(self): # python 2: next
...         try:
...             return next(self.it)
...         except StopIteration:
...             self.it = iter(self.iterable)
...             raise
...     def __iter__(self):
...         return self
... 
>>> 
>>> a = [1, 2, 3]
>>> it = riter(a)
>>> for x in it:
...     x
... 
1
2
3
>>> for x in it:
...     x
... 
1
2
3
timgeb
  • 76,762
  • 20
  • 123
  • 145
  • you just recreate an iterator every time it is exhausted, how is that any different then just iterating over `a` multiple times? (other then `riter` being less efficient.) – Tadhg McDonald-Jensen Apr 18 '16 at 00:03
  • @TadhgMcDonald-Jensen the point is that a `riter` object is an iterator by definition, i.e. `isinstance(riter(x), collections.Iterator)` returns `True` and yet can be used multiple times. It does not aim to be efficient or even practical, it just shows that in the general case an instance of `collections.Iterator` is not a one-time-use object. – timgeb Apr 18 '16 at 05:41
  • How about "[In general, no, you can't reuse any iterator.](https://mail.python.org/pipermail/tutor/2007-March/053345.html)" or maybe "[You don't reuse iterators!](https://bytes.com/topic/python/answers/504647-reuseable-iterators-better)" In any case the OP is clearly having trouble telling the difference between an iterable and iterator. **The difference is that an iterator is meant to be consumed.** – Tadhg McDonald-Jensen Apr 18 '16 at 12:21
  • If you can think a **good** example of an iterator that is not consumed (and not infinite) then I'd love to see it! – Tadhg McDonald-Jensen Apr 18 '16 at 12:26
  • @TadhgMcDonald-Jensen I don't think I have ever seen a good example, which does not mean there does not exist one. I don't think the re-usable iterator has much real world application besides showing that the iterator protocol does not care about re-usability. If I gave you the impression that the resuable iterator would be a good idea, then that's my fault, that was never the intention. It's a technicality, like a class that overrides `__eq__` in such a way that objects with the same ID don't compare as equal. – timgeb Apr 18 '16 at 14:04
  • `iter(it)` is not giving `TypeError` – overexchange Aug 16 '17 at 14:36
  • @overexchange Of course it does not. The point is that it *would* throw a `TypeError` if the iterator did not have an `__iter__` method, in answer to OP's question. – timgeb Aug 18 '17 at 07:44
2

An iterable is something that can be iterated (looped) over, where as an iterator is something that is consumed.

what is an example of something that is an iterable but not an iterator?

Simple, a list. Or any sequence, since you can iterate over a list as many times as you want without destruction to the list:

>>> a = [1,2,3]
>>> for i in a:
    print(i,end=" ")

1 2 3 
>>> for i in a:
    print(i,end=" ")

1 2 3 

Where as an iterator (like a generator) can only be used once:

>>> b = (i for i in range(3))
>>> for i in b:
    print(i,end=" ")

0 1 2 
>>> for i in b:
    print(i,end=" ")


>>> #iterator has already been used up, nothing gets printed

For a list to be consumed like an iterator you would need to use something like self.pop(0) to remove the first element of the list for iteration:

class IteratorList(list):
    def __iter__(self):
        return self #since the current mechanics require this
    def __next__(self):
        try:
            return self.pop(0)
        except IndexError: #we need to raise the expected kind of error
            raise StopIteration
    next = __next__ #for compatibility with python 2

a = IteratorList([1,2,3,4,5])

for i in a:
    print(i)
    if i==3:  # lets stop at three and
        break # see what the list is after

print(a)

which gives this output:

1
2
3
[4, 5]

You see? This is what iterators do, once a value is returned from __next__ it has no reason to hang around in the iterator or in memory, so it is removed. That's why we need the __iter__, to define iterators that let us iterate over sequences without destroying them in the process.


In response to @timgeb's comment, I suppose if you added items to an IteratorList then iterated over it again that would make sense:

a = IteratorList([1,2,3,4,5])

for i in a:
    print(i)

a.extend([6,7,8,9])

for i in a:
    print(i)

But all iterators only make sense to either be consumed or never end. (like itertools.repeat)

Community
  • 1
  • 1
Tadhg McDonald-Jensen
  • 20,699
  • 5
  • 35
  • 59
0

You are thinking in the wrong direction. The reason why an iterator has to implement __iter__ is that this way, both containers and iterators can be used in for and in statement.

> # list is a container
> list = [1,2,3]
> dir(list)
[...,
 '__iter__',
 '__getitem__',
 ...]

> # let's get its iterator
> it = iter(list)
> dir(it)
[...,
 '__iter__',
 '__next__',
 ...]

> # you can use the container directly:
> for i in list:
>     print(i)
1
2
3

> # you can also use the iterator directly:
> for i in it:
>     print(i)
1
2
3
> # the above will fail if it does not implement '__iter__'

And that is also why you simply need to return self in almost all implementations of an iterator. It is not meant for anything funky, just a little bit easiness on syntax.

Ref: https://docs.python.org/dev/library/stdtypes.html#iterator-types

Bruce Wang
  • 81
  • 2