Formally a for
statement in Python always operates on an iterable -- an object which can provide an iterator over its items. The for
statement successively fetches the next
element from the iterator, assigns it to the target name(s) and runs the suite ("body") with that.
# |name| |iterable|
for party in feed.entry:
# body...
print(party.location.address.text)
In the example, feed.entry
is the iterable, party
is the target name and print ...
is the suite. The iterator is automatically requested by the for
statement, and holds the iteration state - e.g. the index of the next element if the iterable is a list.
If you are coming from C++, a classical for (int i = 0; i < 10; ++i)
loop represents external iteration: the iteration state i
is kept outside of the iterable. This corresponds to Python's while
loop:
# for (int i = 0; i < 10; ++i)
i = 0
while i < 10:
i += 1
# access the state of an iterable here
The newer for (auto party : entry)
range loop represents internal iteration: the iteration state is kept by a separate iterator. This corresponds to Python's for
loop. However, the iterable/iterator protocol differs notably: Python's for
uses iter(iterable)
to get an iterator, which should support next(iterator)
- either returning an element or raising StopIteration
.
Written in Python, the definition of the for
statement corresponds to this:
# for party in feed.entry:
__iterator = iter(feed.entry) # iterator -- not visible in containing scope
__iterating = True # graceful exit to enter `else` clause
while __iterating:
try: # attempt to...
item = next(__iterator) # ... get the next item
except StopIteration: # ... or stop
__iterating = False # with a graceful exit
else:
party = item
<suite> # run the body with names bound
else: # entered in a graceful exit only
<else suite>
(Note that the entire block from __iterating = True
to __iterating = False
is not "visible" to the containing scope. Implementations use various optimisations, such as CPython allowing builtin iterators to return a C NULL
instead of raising a Python StopIteration
.)
The for
statement just defines how iterable and iterator are used. If you are mostly familiar with external iteration, it helps looking at iterable and iterator as well.
The iter(iterable)
call has multiple ways to derive an iterator - this is as if iter
were overloaded for various structural types.
If type(iterable).__iter__
is defined, it is called as a method and the result is used as the iterator.
If type(iterable).__getitem__
is defined, it is wrapped by a generic iterator type that returns iterable[0]
, iterable[1]
, ... and raises StopIteration
if IndexError
is raised when indexing.
Either way, iter
returns an iterator or raises TypeError
. An iterator is any type that defines __iter__
(for reusability) and __next__
(for the actual iteration). In general, iterators are objects that may hold state to compute the __next__
item. For example, a list iterator corresponds to this object:
class ListIterator:
"""Python equivalent of ``iter(:list)``"""
# iterator holds iteration state - e.g. iterable and current index
def __init__(self, iterable: list):
self.iterable = iterable
self.index = 0
# __next__ fetches item and advances iteration state - e.g. index += 1
def __next__(self):
# attempt to produce an item
try:
item = self.iterable[self.index]
except IndexError: # translate indexing protocol to iteration protocol
raise StopIteration
# update iteration state
self.index += 1
return item
# iterators can be iterated on in ``for`` statements etc.
def __iter__(self):
return self
(Note that one would idiomatically write such an object as a generator function.)
Indexing a list or incrementing some pointer is only a very basic example of the iterable/iterator protocol. For example, an iterator could be stateless and use random.random()
in __next__
to produce an infinite stream of random numbers. Iterators can also hold state of external information, and iteratively traverse a file system, for example.