3

I read some question and answers about differences between iterators and generators. But I don't understand when you should choose one over other? Do you know any examples (simple, real life ones) when one is better than the other? Thank you.

Community
  • 1
  • 1
  • 4
    What specifically do you not understand after reading the post? – jkd Mar 29 '15 at 01:25
  • I am new to python. Most things is new to me. Simple examples for a noob are better than writing that 'Python’s generators provide a convenient way to implement the iterator protocol' or other technical gorgon. – clappersturdy Mar 29 '15 at 01:27
  • Some knowledge of technical jargon is required, the more you know the better you'll be able to find answers to questions and present questions to get more and better answers. To answer your question, check out [Vincent Driessen's](http://nvie.com/posts/iterators-vs-generators/) post on the interator vs generator question. – LinkBerest Mar 29 '15 at 01:39
  • As I explained in that (top-voted but still unaccepted:-) answer nearly five years ago, all generators are iterators (though not vice versa) so your question is absolutely bereft of any sense -- like asking "when should I eat spaghetti rather than pasta" when spaghetti **are** an example (instance, special case of) pasta, or asking "should I buy a Labrador or a dog" (since Labradors **are** a breed of dogs), and so forth. – Alex Martelli Mar 29 '15 at 01:48
  • Brevity, genexp -> generator. Extensibility, flexibility -> iterator. – Shashank Mar 29 '15 at 01:48
  • @AlexMartelli I hope my isn't too far off point, I didn't read the linked question/answer before starting my own here -- and should have. (Also, if I may, [this](http://stackoverflow.com/questions/29322237/are-unbound-descriptors-possible) seems like something you may know something about, if you're bored :) – jedwards Mar 29 '15 at 02:25

1 Answers1

4

Iterators provide efficient ways of iterating over an existing data structure.

Generators provide efficient ways of generating elements of a sequence on the fly.

Iterator Example

Python's file readers can be used as iterators. So what you might use to process one line of a file:

with open('file.txt', 'rb') as fh:
    lines = fh.readlines()  # this reads the entire file into lines, now
    for line in lines:
        process(line)       # something to be done on each line

You can implement more efficiently using iterators

with open('file.txt', 'rb') as fh:
    for line in fh:         # this will only read as much as needed, each time
        process(line)

The advantage is in the fact that in the second example, you're not reading the entire file into memory, then iterating over a list of lines. Instead, the reader (BufferedReader in Python3) is reading a line at a time, every time you ask for one.

Generator Example

Generators generate elements of a sequence on the fly. Consider the following:

def fib():
    idx  = 0
    vals = [0,1]
    while True:
        # If we need to compute a new value, do so on the fly
        if len(vals) <= idx: vals.append(vals[-1] + vals[-2])
        yield vals[idx]
        idx += 1

This is an example of a generator. In this case, every time it's "called" it produces the next number in the Fibonacci sequence.

I put "called" in scare quotes because the method of getting successive values from generators is different than a traditional function.

We have two main ways to get values from generators:

Iterating over it

# Print the fibonacci sequence until some event occurs
for f in fib():
    print(f)
    if f > 100: break

Here we use the in syntax to iterate over the generator, and print the values that are returned, until we get a value that's greater than 100.

Output:

0
1
1
2
3
5
8
13
21
34
55
89
144

Calling next()

We could also call next on the generator (since generators are iterators) and (generate and) access the values that way:

f = fib()

print(next(f))  # 0
print(next(f))  # 1
print(next(f))  # 1
print(next(f))  # 2
print(next(f))  # 3

There are more persuasive examples of generators however. And these often come in the form of "generator expressions", a related concept (PEP-289).

Consider something like the following:

first = any((expensive_thing(i) for i in range(100)))

Here, we're creating a generator expression:

(expensive_thing(i) for i in range(100))

And passing it to the any built-in function. any will return True as soon as an element of the iterable is determined to be True. So when you pass a generator function to any, it will only call expensive_thing(i) as many times as necessary to find a True-ish value.

Compare this with using a list comprehension passed to any:

first = any([expensive_thing(i) for i in range(100)])

In this case, expensive_thing(i) will be called for all values of i, first, then the 100-element list of True/False values will be given to any which will return True if it finds a True-ish value.

But if expensive_thing(0) returned True, clearly the better approach would only be to evaluate that, test it, and stop there. Generators allow you to do this, whereas something like a list comprehension do not.


Consider the following example, illustrating the advantage of using a generator expression over list comprehension:

import time

def expensive_thing(n):
    time.sleep(0.1)
    return 10 < n < 20

# Find first True value, by using a generator expression
t0 = time.time()
print( any((expensive_thing(i) for i in range(100))) )
t1 = time.time()
td1 = t1-t0

# Find first True value, by using a list comprehension
t0 = time.time()
print( any([expensive_thing(i) for i in range(100)]) )
t1 = time.time()
td2 = t1-t0

print("TD 1:", td1)  # TD 1:  1.213068962097168
print("TD 2:", td2)  # TD 2: 10.000572204589844

The function expensive_thing introduces an artificial delay to illustrate the difference between the two approaches. The second (list comprehension) approach takes significantly longer, because expensive_thing is evaluated at all 100 indices, whereas the first only calls expensive_thing until it finds a True values (i=11).

Community
  • 1
  • 1
jedwards
  • 29,432
  • 3
  • 65
  • 92
  • 1
    The Python 3.x `range` and the Python 2.x `xrange` are not generators. They are lazy sequences. The definition of a generator is a function that produces an iterator. See https://docs.python.org/3/glossary.html#term-generator –  Mar 29 '15 at 01:43
  • 2
    For the purposes of understanding the concept, they can 100% be thought of as generators, and since these built-ins are likely ones the OP is familiar with, I used them -- but I'll update my post with a proper generator. – jedwards Mar 29 '15 at 01:45