44

I wanted to understand a bit more about iterators, so please correct me if I'm wrong.

An iterator is an object which has a pointer to the next object and is read as a buffer or stream (i.e. a linked list). They're particularly efficient cause all they do is tell you what is next by references instead of using indexing.

However I still don't understand why is the following behavior happening:

In [1]: iter = (i for i in range(5))

In [2]: for _ in iter:
   ....:     print _
   ....:     
0
1
2
3
4

In [3]: for _ in iter:
   ....:     print _
   ....:     

In [4]: 

After a first loop through the iterator (In [2]) it's as if it was consumed and left empty, so the second loop (In [3]) prints nothing.

However I never assigned a new value to the iter variable.

What is really happening under the hood of the for loop?

Rick
  • 43,029
  • 15
  • 76
  • 119
Matteo
  • 7,924
  • 24
  • 84
  • 129

6 Answers6

57

Your suspicion is correct: the iterator has been consumed.

In actuality, your iterator is a generator, which is an object which has the ability to be iterated through only once.

type((i for i in range(5))) # says it's type generator 

def another_generator():
    yield 1 # the yield expression makes it a generator, not a function

type(another_generator()) # also a generator

The reason they are efficient has nothing to do with telling you what is next "by reference." They are efficient because they only generate the next item upon request; all of the items are not generated at once. In fact, you can have an infinite generator:

def my_gen():
    while True:
        yield 1 # again: yield means it is a generator, not a function

for _ in my_gen(): print(_) # hit ctl+c to stop this infinite loop!

Some other corrections to help improve your understanding:

  • The generator is not a pointer, and does not behave like a pointer as you might be familiar with in other languages.
  • One of the differences from other languages: as said above, each result of the generator is generated on the fly. The next result is not produced until it is requested.
  • The keyword combination for in accepts an iterable object as its second argument.
  • The iterable object can be a generator, as in your example case, but it can also be any other iterable object, such as a list, or dict, or a str object (string), or a user-defined type that provides the required functionality.
  • The iter function is applied to the object to get an iterator (by the way: don't use iter as a variable name in Python, as you have done - it is one of the keywords). Actually, to be more precise, the object's __iter__ method is called (which is, for the most part, all the iter function does anyway; __iter__ is one of Python's so-called "magic methods").
  • If the call to __iter__ is successful, the function next() is applied to the iterable object over and over again, in a loop, and the first variable supplied to for in is assigned to the result of the next() function. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's __next__ method, which is another "magic method".
  • The for loop ends when next() raises the StopIteration exception (which usually happens when the iterable does not have another object to yield when next() is called).

You can "manually" implement a for loop in python this way (probably not perfect, but close enough):

try:
    temp = iterable.__iter__()
except AttributeError():
    raise TypeError("'{}' object is not iterable".format(type(iterable).__name__))
else:
    while True:
        try:
            _ = temp.__next__()
        except StopIteration:
            break
        except AttributeError:
            raise TypeError("iter() returned non-iterator of type '{}'".format(type(temp).__name__))
        # this is the "body" of the for loop
        continue

There is pretty much no difference between the above and your example code.

Actually, the more interesting part of a for loop is not the for, but the in. Using in by itself produces a different effect than for in, but it is very useful to understand what in does with its arguments, since for in implements very similar behavior.

  • When used by itself, the in keyword first calls the object's __contains__ method, which is yet another "magic method" (note that this step is skipped when using for in). Using in by itself on a container, you can do things like this:

    1 in [1, 2, 3] # True
    'He' in 'Hello' # True
    3 in range(10) # True
    'eH' in 'Hello'[::-1] # True
    
  • If the iterable object is NOT a container (i.e. it doesn't have a __contains__ method), in next tries to call the object's __iter__ method. As was said previously: the __iter__ method returns what is known in Python as an iterator. Basically, an iterator is an object that you can use the built-in generic function next() on1. A generator is just one type of iterator.

  • If the call to __iter__ is successful, the in keyword applies the function next() to the iterable object over and over again. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's __next__ method).
  • If the object doesn't have a __iter__ method to return an iterator, in then falls back on the old-style iteration protocol using the object's __getitem__ method2.
  • If all of the above attempts fail, you'll get a TypeError exception.

If you wish to create your own object type to iterate over (i.e, you can use for in, or just in, on it), it's useful to know about the yield keyword, which is used in generators (as mentioned above).

class MyIterable():
    def __iter__(self):
        yield 1

m = MyIterable()
for _ in m: print(_) # 1
1 in m # True    

The presence of yield turns a function or method into a generator instead of a regular function/method. You don't need the __next__ method if you use a generator (it brings __next__ along with it automatically).

If you wish to create your own container object type (i.e, you can use in on it by itself, but NOT for in), you just need the __contains__ method.

class MyUselessContainer():
    def __contains__(self, obj):
        return True

m = MyUselessContainer()
1 in m # True
'Foo' in m # True
TypeError in m # True
None in m # True

1 Note that, to be an iterator, an object must implement the iterator protocol. This only means that both the __next__ and __iter__ methods must be correctly implemented (generators come with this functionality "for free", so you don't need to worry about it when using them). Also note that the ___next__ method is actually next (no underscores) in Python 2.

2 See this answer for the different ways to create iterable classes.

Community
  • 1
  • 1
Rick
  • 43,029
  • 15
  • 76
  • 119
  • More info on that [here](https://wiki.python.org/moin/Generators) and [here](https://docs.python.org/3.4/reference/expressions.html). – TigerhawkT3 Apr 02 '15 at 01:11
  • But aren't all the elements that the `iterator` goes through generated at once during the assignment (`In [1]`)? And why does the `for`consume the `iterator`, shouldn't it just iterate on a copy? I guess I am thinking of it as a `pointer` and that causes my confusion, what are the main difference between the two? Thanks a lot! – Matteo Apr 02 '15 at 01:15
  • @Matteo In your example, you create a generator expression, not a list. In a generator elements are created one at a time. – Marcin Apr 02 '15 at 01:16
  • @Marcin - But so you mean that after (`In [1]`) the values `1,2,3,4` are not yet stored anywhere? – Matteo Apr 02 '15 at 01:22
  • @Matteo Yes. Only generator object exist. – Marcin Apr 02 '15 at 01:24
  • 1
    @Matteo: Marcin is correct. And yes, you're thinking of it as a pointer, but isn't a pointer. Your code - the stuff in the parentheses - it a generator comprehension. Once the generator raises `StopIteration`, it's done. No, `0,1,2,3,4` is not stored anywhere. `range(5)` produces the values one at a time. It doesn't produce them all at once. Each time `next()` is called, the generator generates the next value. Look up some information about functional programming, such as in Haskel, where this idea is normal, vs languages like Java and c++. – Rick Apr 02 '15 at 01:26
  • Actually, to be accurate, `range()` produces the values one at a time in Python 3. It was not always this way. – Rick Apr 02 '15 at 01:33
  • Wow! Thanks for all your detailed explanations, that was great... I guess I just need to understand fully the difference between `generator` and `iterator`. – Matteo Apr 02 '15 at 01:34
  • Yes, they are both iterable, but they are two different objects. In your research, make sure you also read about the keyword `yield`. – Rick Apr 02 '15 at 01:51
  • 2
    your "manual" loop is sus. you simply assign to `iterable.__next__` (which may or may not exist for an iterable!) and never call it – wim Apr 02 '15 at 05:32
  • 1
    it would be more correct to create a `temp = iter(iterable)` and then call `next(temp)` in the try block. a `listiterator`, for example, has no `__next__` method – wim Apr 02 '15 at 05:38
  • I think you're confused because you've seen so-called `comprehensions`. So for your question `aren't all the elements that the iterator goes through generated at once during the assignment` - nope, that's because of round parenthesis. If you wrap your first line with square `[]` (or curly `{}`) brackets, `[i for i in range(5)]` - it will be brand-new `list` (or `set`), filled with generated values. Round braces `( )` it that case will work as expression wrapping, not as `tuple` creation, as you might expect (see difference between `a = (1)` and `a = [1]`). Check `tuple(i for i in range(5))`. – Ivan Klass Apr 02 '15 at 06:44
  • 4
    This answer conflates the `in` operator as used in code like `1 in [1, 2, 3]` with the keyword's usage in `for` loops. The `in` operator simply calls the [`__contains__` method](https://docs.python.org/2.7/reference/datamodel.html#object.__contains__), falling back to iterating over the object if the method does not exist. – Matt Nordhoff Apr 02 '15 at 08:33
  • @Matt Nordhoff would you consider helping edit the answer to improve the deficiencies? – Rick Apr 02 '15 at 11:13
  • @wim You're right. Made a change per your suggestion. Anyone else: feel free to edit my answer to make it better since it was selected as the accepted answer. – Rick Apr 02 '15 at 13:17
  • @wim a list iterator seems to have a `__next__` method in Python 3, at least. This works: `i = iter([1,2,3])`, `i.__next__()`. – Rick Apr 02 '15 at 14:11
  • That's true, in python2 it's just `next`. Anyway this answer is a bit of a mess in my opinion, the stuff about `__contains__` is completely irrelevant here. Better to get to the point and explain all that is needed: the [iterator protocol](https://docs.python.org/2/library/stdtypes.html#iterator-types) – wim Apr 03 '15 at 02:38
  • @wim Yup, you're right - should probably include information about the iterator protocol. I'll make some more edits. However, I think `__contains__` is indirectly relevant - speaking as a recent Python newbie (just started messing with it about a year ago), I can tell you that understanding what the difference is when I see `for` `in` vs when I see just `in` is extremely relevant to understanding what is going on and very helpful when learning about this topic (`for` and iterators). – Rick Apr 03 '15 at 14:44
20

For loop basically calls the next method of an object that is applied to (__next__ in Python 3).

You can simulate this simply by doing:

iter = (i for i in range(5))

print(next(iter))
print(next(iter))  
print(next(iter))  
print(next(iter))  
print(next(iter)) 

# this prints 1 2 3 4 

At this point there is no next element in the input object. So doing this:

print(next(iter))  

Will result in StopIteration exception thrown. At this point for will stop. And iterator can be any object which will respond to the next() function and throws the exception when there are no more elements. It does not have to be any pointer or reference (there are no such things in python anyway in C/C++ sense), linked list, etc.

Community
  • 1
  • 1
Marcin
  • 215,873
  • 14
  • 235
  • 294
6

There is an iterator protocol in python that defines how the for statement will behave with lists and dicts, and other things that can be looped over.

It's in the python docs here and here.

The way the iterator protocol works typically is in the form of a python generator. We yield a value as long as we have a value until we reach the end and then we raise StopIteration

So let's write our own iterator:

def my_iter():
    yield 1
    yield 2
    yield 3
    raise StopIteration()

for i in my_iter():
    print i

The result is:

1
2
3

A couple of things to note about that. The my_iter is a function. my_iter() returns an iterator.

If I had written using iterator like this instead:

j = my_iter()    #j is the iterator that my_iter() returns
for i in j:
    print i  #this loop runs until the iterator is exhausted

for i in j:
    print i  #the iterator is exhausted so we never reach this line

And the result is the same as above. The iter is exhausted by the time we enter the second for loop.

But that's rather simplistic what about something more complicated? Perhaps maybe in a loop why not?

def capital_iter(name):
    for x in name:
        yield x.upper()
    raise StopIteration()

for y in capital_iter('bobert'):
    print y

And when it runs, we use the iterator on the string type (which is built into iter). This in turn, allows us run a for loop on it, and yield the results until we are done.

B
O
B
E
R
T

So now this begs the question, so what happens between yields in the iterator?

j = capital_iter("bobert")
print i.next()
print i.next()
print i.next()

print("Hey there!")

print i.next()
print i.next()
print i.next()

print i.next()  #Raises StopIteration

The answer is the function is paused at the yield waiting for the next call to next().

B
O
B
Hey There!
E
R
T
Traceback (most recent call last):
  File "", line 13, in 
    StopIteration
itzmebibin
  • 9,199
  • 8
  • 48
  • 62
MadMan2064
  • 436
  • 3
  • 4
  • 1
    It is not necessary to explicitly raise a `StopIteration`. Generator functions will do this anyway with that line omitted – wim Apr 03 '15 at 04:41
4

Some additional details about the behaviour of iter() with __getitem__ classes that lack their own __iter__ method.


Before __iter__ there was __getitem__. If the __getitem__ works with ints from 0 - len(obj)-1, then iter() supports these objects. It will construct a new iterator that repeatedly calls __getitem__ with 0, 1, 2, ... until it gets an IndexError, which it converts to a StopIteration.

See this answer for more details of the different ways to create an iterator.

Community
  • 1
  • 1
Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
3

Excerpt from the Python Practice book:


5. Iterators & Generators

5.1. Iterators

We use for statement for looping over a list.

>>> for i in [1, 2, 3, 4]:
...     print i,
...
1
2
3
4

If we use it with a string, it loops over its characters.

>>> for c in "python":
...     print c
...
p
y
t
h
o
n

If we use it with a dictionary, it loops over its keys.

>>> for k in {"x": 1, "y": 2}:
...     print k
...
y
x

If we use it with a file, it loops over lines of the file.

>>> for line in open("a.txt"):
...     print line,
...
first line
second line

So there are many types of objects which can be used with a for loop. These are called iterable objects.

There are many functions which consume these iterables.

>>> ",".join(["a", "b", "c"])
'a,b,c'
>>> ",".join({"x": 1, "y": 2})
'y,x'
>>> list("python")
['p', 'y', 't', 'h', 'o', 'n']
>>> list({"x": 1, "y": 2})
['y', 'x']

5.1.1. The Iteration Protocol

The built-in function iter takes an iterable object and returns an iterator.

    >>> x = iter([1, 2, 3])
>>> x
<listiterator object at 0x1004ca850>
>>> x.next()
1
>>> x.next()
2
>>> x.next()
3
>>> x.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

StopIteration

Each time we call the next method on the iterator gives us the next element. If there are no more elements, it raises a StopIteration.

Iterators are implemented as classes. Here is an iterator that works like built-in xrange function.

class yrange:
    def __init__(self, n):
        self.i = 0
        self.n = n

    def __iter__(self):
        return self

    def next(self):
        if self.i < self.n:
            i = self.i
            self.i += 1
            return i
        else:
            raise StopIteration()

The iter method is what makes an object iterable. Behind the scenes, the iter function calls iter method on the given object.

The return value of iter is an iterator. It should have a next method and raise StopIteration when there are no more elements.

Lets try it out:

>>> y = yrange(3)
>>> y.next()
0
>>> y.next()
1
>>> y.next()
2
>>> y.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 14, in next

StopIteration

Many built-in functions accept iterators as arguments.

>>> list(yrange(5))
[0, 1, 2, 3, 4]
>>> sum(yrange(5))
10

In the above case, both the iterable and iterator are the same object. Notice that the iter method returned self. It need not be the case always.

class zrange:
    def __init__(self, n):
        self.n = n

    def __iter__(self):
        return zrange_iter(self.n)

class zrange_iter:
    def __init__(self, n):
        self.i = 0
        self.n = n

    def __iter__(self):
        # Iterators are iterables too.
        # Adding this functions to make them so.
        return self

    def next(self):
        if self.i < self.n:
            i = self.i
            self.i += 1
            return i
        else:
            raise StopIteration()

If both iteratable and iterator are the same object, it is consumed in a single iteration.

>>> y = yrange(5)
>>> list(y)
[0, 1, 2, 3, 4]
>>> list(y)
[]
>>> z = zrange(5)
>>> list(z)
[0, 1, 2, 3, 4]
>>> list(z)
[0, 1, 2, 3, 4]

5.2. Generators

Generators simplifies creation of iterators. A generator is a function that produces a sequence of results instead of a single value.

def yrange(n):
   i = 0
    while i < n:
        yield i
        i += 1

Each time the yield statement is executed the function generates a new value.

>>> y = yrange(3)
>>> y
<generator object yrange at 0x401f30>
>>> y.next()
0
>>> y.next()
1
>>> y.next()
2
>>> y.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

StopIteration

So a generator is also an iterator. You don’t have to worry about the iterator protocol.

The word “generator” is confusingly used to mean both the function that generates and what it generates. In this chapter, I’ll use the word “generator” to mean the generated object and “generator function” to mean the function that generates it.

Can you think about how it is working internally?

When a generator function is called, it returns a generator object without even beginning execution of the function. When next method is called for the first time, the function starts executing until it reaches yield statement. The yielded value is returned by the next call.

The following example demonstrates the interplay between yield and call to next method on generator object.

>>> def foo():
...     print "begin"
...     for i in range(3):
...         print "before yield", i
...         yield i
...         print "after yield", i
...     print "end"
...
>>> f = foo()
>>> f.next()
begin
before yield 0
0
>>> f.next()
after yield 0
before yield 1
1
>>> f.next()
after yield 1
before yield 2
2
>>> f.next()
after yield 2
end
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

StopIteration

Lets see an example:

def integers():
    """Infinite sequence of integers."""
    i = 1
    while True:
        yield i
        i = i + 1

def squares():
    for i in integers():
        yield i * i

def take(n, seq):
    """Returns first n values from the given sequence."""
    seq = iter(seq)
    result = []
    try:
        for i in range(n):
            result.append(seq.next())
    except StopIteration:
        pass
    return result

print take(5, squares()) # prints [1, 4, 9, 16, 25]
Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
drewteriyaki
  • 320
  • 1
  • 3
  • 12
2

Concept 1

All generators are iterators but all iterators are not generator

Concept 2

An iterator is an object with a next (Python 2) or next (Python 3) method.

Concept 3

Quoting from wiki Generators Generators functions allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop.

In your case

>>> it = (i for i in range(5))
>>> type(it)
<type 'generator'>
>>> callable(getattr(it, 'iter', None))
False
>>> callable(getattr(it, 'next', None))
True
Abhijit
  • 62,056
  • 18
  • 131
  • 204