49

I know how yield works. I know permutation, think it just as a math simplicity.

But what's yield's true force? When should I use it? A simple and good example is better.

johnsyweb
  • 136,902
  • 23
  • 188
  • 247
whi
  • 2,685
  • 6
  • 33
  • 40
  • possible duplicate of [The Python yield keyword explained](http://stackoverflow.com/questions/231767/the-python-yield-keyword-explained) – JBernardo Oct 25 '11 at 02:15

4 Answers4

91

yield is best used when you have a function that returns a sequence and you want to iterate over that sequence, but you do not need to have every value in memory at once.

For example, I have a python script that parses a large list of CSV files, and I want to return each line to be processed in another function. I don't want to store the megabytes of data in memory all at once, so I yield each line in a python data structure. So the function to get lines from the file might look something like:

def get_lines(files):
    for f in files:
        for line in f:
            #preprocess line
            yield line

I can then use the same syntax as with lists to access the output of this function:

for line in get_lines(files):
    #process line

but I save a lot of memory usage.

murgatroid99
  • 19,007
  • 10
  • 60
  • 95
  • so get_lines() encapsule work of files, but user also can use iteration to call readlines(), same effect, no need to yield i think. – whi Oct 25 '11 at 02:42
  • The idea is that `get_lines()` could be some arbitrary function that returns a sequence of objects with known structure, and `yield` allows it to return a very large number of such objects without using up too much memory. – murgatroid99 Oct 25 '11 at 03:02
  • Note that `f.readlines()` isn't lazy -- it reads the entire file into memory and materializes the list. `for line in f:` would be more memory-friendly. – DSM Dec 17 '13 at 04:07
22

Simply put, yield gives you a generator. You'd use it where you would normally use a return in a function. As a really contrived example cut and pasted from a prompt...

>>> def get_odd_numbers(i):
...     return range(1, i, 2)
... 
>>> def yield_odd_numbers(i):
...     for x in range(1, i, 2):
...             yield x
... 
>>> foo = get_odd_numbers(10)
>>> bar = yield_odd_numbers(10)
>>> foo
[1, 3, 5, 7, 9]
>>> bar
<generator object yield_odd_numbers at 0x1029c6f50>
>>> next(bar)
1
>>> next(bar)
3
>>> next(bar)
5

As you can see, in the first case foo holds the entire list in memory at once. It's not a big deal for a list with 5 elements, but what if you want a list of 5 million? Not only is this a huge memory eater, it also costs a lot of time to build at the time that the function is called. In the second case, bar just gives you a generator. A generator is an iterable--which means you can use it in a for loop, etc, but each value can only be accessed once. All the values are also not stored in memory at the same time; the generator object "remembers" where it was in the looping the last time you called it--this way, if you're using an iterable to (say) count to 50 billion, you don't have to count to 50 billion all at once and store the 50 billion numbers to count through. Again, this is a pretty contrived example, you probably would use itertools if you really wanted to count to 50 billion. :)

This is the most simple use case of generators. As you said, it can be used to write efficient permutations, using yield to push things up through the call stack instead of using some sort of stack variable. Generators can also be used for specialized tree traversal, and all manner of other things.

Further reading:

Community
  • 1
  • 1
waffle paradox
  • 2,755
  • 18
  • 19
  • 3
    The second example also holds the entire list in memory at once, because it needs to keep the entire list to back the generator. – user2357112 Dec 17 '13 at 04:11
4

I'm reading Data Structures and Algorithms in Python

There is a fibonacci function using yield. I think it's the best moment to use yield.

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a+b

you can use this like:

gen = fibonacci()
for i, f in enumerate(gen):
    print(i, f)
    if i >= 100: break

So, I think, maybe, when the next element is depending on previous elements, e.g., digital filters, it's time to use yield.

Rainald62
  • 706
  • 12
  • 19
Xin Zhou
  • 41
  • 1
4

Another use is in a network client. Use 'yield' in a generator function to round-robin through multiple sockets without the complexity of threads.

For example, I had a hardware test client that needed to send a R,G,B planes of an image to firmware. The data needed to be sent in lockstep: red, green, blue, red, green, blue. Rather than spawn three threads, I had a generator that read from the file, encoded the buffer. Each buffer was a 'yield buf'. End of file, function returned and I had end-of-iteration.

My client code looped through the three generator functions, getting buffers until end-of-iteration.

David Poole
  • 3,432
  • 5
  • 34
  • 34
  • thanks. yes '3 threads + lock' is not good. but why in the same main thread? – whi Oct 25 '11 at 02:59
  • Simplicity. The script was a small command line app. No GUI. Also, everything in the same thread meant an error on one socket would shut down the entire client. Since I was talking to only one server, a death of one socket meant I could quickly stop all sockets. – David Poole Oct 25 '11 at 03:14