10

I'm a programming newbie and am having some trouble understanding an example from my python textbook ("Beginning Python" by Magnus Lie Hetland). The example is for a recursive generator designed to flatten the elements of nested lists (with arbitrary depth):

def flatten(nested):
    try:
        for sublist in nested:
            for element in flatten(sublist):
                yield element
    except TypeError:
        yield nested

You would then feed in a nested list as follows:

>>> list(flatten([[[1],2],3,4,[5,[6,7]],8]))
[1,2,3,4,5,6,7,8]

I understand how the recursion within flatten() helps to whittle down to the innermost element of this list, '1', but what I don't understand is what happens when '1' is actually passed back into flatten() as 'nested'. I thought that this would lead to a TypeError (can't iterate over a number), and that the exception handling was what would actually do the heavy lifting for generating output... but testing with modified versions of flatten() has convinced me that this isn't the case. Instead, it seems like the 'yield element' line is responsible.

That said, my question is this... how can 'yield element' ever actually be executed? It seems like 'nested' will either be a list - in which case another layer of recursion is added - or it's a number and you get a TypeError.

Any help with this would be much appreciated... in particular, I'd love to be walked through the chain of events as flatten() handles a simple example like:

list(flatten([[1,2],3]))
WithoutATowel
  • 157
  • 1
  • 8
  • 1
    I recommend reading the answers of [The Python yield keyword explained](http://stackoverflow.com/questions/231767/the-python-yield-keyword-explained) for a good introduction to all relevant concepts. – Sven Marnach Jul 07 '12 at 17:46
  • @SvenMarnach: The `yield from` feature in 3.3 would make this a piece of cake :) – Joel Cornett Jul 07 '12 at 17:50
  • 2
    A side-comment: the code above is not generic enough to handle strings. E.g., list(flatten(['abc', 'def'])) breaks. – Roman Susi Jul 07 '12 at 18:52
  • 1
    Does this answer your question? [How do I make a flat list out of a list of lists?](https://stackoverflow.com/questions/952914/how-do-i-make-a-flat-list-out-of-a-list-of-lists) – questionto42 Jul 08 '22 at 18:40

5 Answers5

12

I have added some instrumentation to the function:

def flatten(nested, depth=0):
    try:
        print("{}Iterate on {}".format('  '*depth, nested))
        for sublist in nested:
            for element in flatten(sublist, depth+1):
                print("{}got back {}".format('  '*depth, element))
                yield element
    except TypeError:
        print('{}not iterable - return {}'.format('  '*depth, nested))
        yield nested

Now calling

list(flatten([[1,2],3]))

displays

Iterate on [[1, 2], 3]
  Iterate on [1, 2]
    Iterate on 1
    not iterable - return 1
  got back 1
got back 1
    Iterate on 2
    not iterable - return 2
  got back 2
got back 2
  Iterate on 3
  not iterable - return 3
got back 3
Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99
6

Perhaps part of your confusion is that you're thinking of the final yield statement as though it were a return statement. Indeed, a couple of people have suggested that when a TypeError is thrown in this code, the item passed is "returned". That's not the case!

Remember that any time yield appears in a function, the result is not a single item, but an iterable -- even if only one item appears in the sequence. So when you pass 1 to flatten, the result is a one-item generator. To get the item out of it, you still need to iterate over it.

Since this one-item generator is iterable, it doesn't throw a TypeError when the inner for loop tries to iterate over it; but the inner for loop only executes once. Then the outer for loop moves on to the next iterable in the nested list.

Another way to think about this would be to say that every time you pass a non-iterable value to flatten, it wraps the value in a one-item iterable and "returns" that.

senderle
  • 145,869
  • 36
  • 209
  • 233
4

A great way to break down a function that you generally understand, but one little part is stumping you, is to use the python debugger. Here it is with comments added:

-> def flatten(nested):
(Pdb) l
  1  -> def flatten(nested):
  2         try:
  3             for sublist in nested:
  4                 for element in flatten(sublist):
  5                     yield element
  6         except TypeError:
  7             yield nested
  8     
  9     import pdb; pdb.set_trace()
 10     list(flatten([[1,2],3]))
 11     
(Pdb) a
nested = [[1, 2], 3]

Above, we've just entered the function and the argument is [[1, 2], 3]. Let's use pdb's step function to step through the function into any recursive calls we should encounter:

(Pdb) s
> /Users/michael/foo.py(2)flatten()
-> try:
(Pdb) s
> /Users/michael/foo.py(3)flatten()
-> for sublist in nested:
(Pdb) s
> /Users/michael/foo.py(4)flatten()
-> for element in flatten(sublist):
(Pdb) s
--Call--
> /Users/michael/foo.py(1)flatten()
-> def flatten(nested):
(Pdb) a
nested = [1, 2]

We've stepped into one inner frame of flatten, where the argument is [1, 2].

(Pdb) s
> /Users/michael/foo.py(2)flatten()
-> try:
(Pdb) s
> /Users/michael/foo.py(3)flatten()
-> for sublist in nested:
(Pdb) s
> /Users/michael/foo.py(4)flatten()
-> for element in flatten(sublist):
(Pdb) s
--Call--
> /Users/michael/foo.py(1)flatten()
-> def flatten(nested):
(Pdb) a
nested = 1

Two frames in, the argument 1 isn't an iterable anymore. This should be interesting…

(Pdb) s
> /Users/michael/foo.py(2)flatten()
-> try:
(Pdb) s
> /Users/michael/foo.py(3)flatten()
-> for sublist in nested:
(Pdb) s
TypeError: "'int' object is not iterable"
> /Users/michael/foo.py(3)flatten()
-> for sublist in nested:
(Pdb) s
> /Users/michael/foo.py(6)flatten()
-> except TypeError:
(Pdb) s
> /Users/michael/foo.py(7)flatten()
-> yield nested
(Pdb) s
--Return--
> /Users/michael/foo.py(7)flatten()->1
-> yield nested

OK, so because of the except TypeError, we're just yielding the argument itself. Up a frame!

(Pdb) s
> /Users/michael/foo.py(5)flatten()
-> yield element
(Pdb) l
  1     def flatten(nested):
  2         try:
  3             for sublist in nested:
  4                 for element in flatten(sublist):
  5  ->                 yield element
  6         except TypeError:
  7             yield nested
  8     
  9     import pdb; pdb.set_trace()
 10     list(flatten([[1,2],3]))
 11     

yield element will of course yield 1, so once our lowest frame hits a TypeError, the result propagates all the way up the stack to the outermost frame of flatten, which yields it to the outside world before moving on to further parts of the outer iterable.

kojiro
  • 74,557
  • 19
  • 143
  • 201
1

the try except construction catches the exception for you and yields nested back which is just the argument that was given to flatten().

So flatten(1) will go wrong in for sublist in nested: and continues with the except part and yields nested which is 1.

Marco de Wit
  • 2,686
  • 18
  • 22
1

yield element can be executed if nested is a list but sublist is not (i.e., if nested is a normal "flat" list). In this case, for sublist in nested will work fine. When the next line recursively calls flatten sublist a typerror will be raised when the recursive call tries to iterate over the "sublist" (which is not iterable). This TypeError is caught and the recursive call yields the entire input list back, so it is then iterated over by the for element in flatten(sublist) call. In other words, for element in flatten(sublist) winds up doing for element in sublist if sublist is already flat.

The key thing to recognize is that even a non-nested list will result in a recursive call. A call like flatten([1]) will result in two yields: the recursive call will yield [1] to the outer call, and the outer call immediately re-yields 1.

This version of the function may help to understand what's going on:

    def flatten(nested, indent=""):
        try:
            print indent, "Going to iterate over", nested
            for sublist in nested:
                print indent, "Going to iterate over flattening of", sublist
                for element in flatten(sublist, indent+"  "):
                    print indent, "Yielding", element
                    yield element
        except TypeError:
            print indent, "Type Error!  Yielding", nested
            yield nested

    >>> list(flatten([[1,2],3]))
     Going to iterate over [[1, 2], 3]
     Going to iterate over flattening of [1, 2]
       Going to iterate over [1, 2]
       Going to iterate over flattening of 1
         Going to iterate over 1
         Type Error!  Yielding 1
       Yielding 1
     Yielding 1
       Going to iterate over flattening of 2
         Going to iterate over 2
         Type Error!  Yielding 2
       Yielding 2
     Yielding 2
     Going to iterate over flattening of 3
       Going to iterate over 3
       Type Error!  Yielding 3
     Yielding 3
    [1, 2, 3]
BrenBarn
  • 242,874
  • 37
  • 412
  • 384