Extracting strings from nested lists in Python

Question

Possible Duplicate:
Flatten (an irregular) list of lists in Python

I'm trying to use the nltk library in python, and more specifically the wordnet corpus, to extract all the words in a broad semantic category like 'animal'. I've managed to write a function that goes down through all the categories and extracts the words in them, but what I end up with is a huge jumble of lists within lists. The lists aren't of any predictable length or depth, they look like this:

['pet', 'pest', 'mate', 'young', 'stunt', 'giant', ['hen', 'dam', 'filly'], ['head', 'stray', 'dog', ['puppy', 'toy', 'spitz', 'pooch', 'doggy', 'cur', 'mutt', 'pug', 'corgi', ['Peke'], ['chow'], ['feist', 'fice'], ['hound', ['Lhasa', 'cairn']], ['boxer', 'husky']], ['tabby', 'tabby', 'queen', 'Manx', 'tom', 'kitty', 'puss', 'pussy', ['gib']]]

What I want is to be able to grab each of those strings out of that , and return a single, unnested list. Any advice?

There are also some solutions in another thread, [How to optimally turn a multidimentional list into a single list of items in Python?](http://stackoverflow.com/questions/6679228/how-to-optimally-turn-a-multidimentional-list-into-a-single-list-of-items-in-pyt) (in particular, I am fond of the non-recursive solution I posted in that thread). — kindall, Feb 21 '12 at 05:31
I just did this in Scheme a week ago, similar to what Li-aung is offering. http://stackoverflow.com/questions/9262570/eliminate-inner-paranthesis-runs-into-empty-list-and-doesnt-eliminate-using-con — CppLearner, Feb 21 '12 at 05:36

Li-aung Yip · Accepted Answer · 2012-02-21T05:43:51.583

3

In general, when you have to deal with arbitrary levels of nesting, a recursive solution is a good fit. Lists within lists, parsing HTML (tags within tags), working with filesystems (directories within directories), etc.

I haven't tested this code extensively, but I believe it should do what you want:

ll = [ 1, 2, 3, [4, 5, [6, 7, 8]]]

def flatten(input_list):
    output_list = []
    for element in input_list:
        if type(element) == list:
            output_list.extend(flatten(element))
        else:
            output_list.append(element)
    return output_list

print (flatten(ll)) #prints [1, 2, 3, 4, 5, 6, 7, 8]

In general recursion is very easy to think about and the solutions tend to be very elegant (like above) but for really, really deeply nested things - think thousands of levels deep - you can run into problems like stack overflow.

Generally this isn't a problem, but I believe a recursive function can always* be converted to a loop (it just doesn't look as nice.)

Note: I am not crash-hot on my compsci theory here. Someone can add details or correct me if I'm wrong.

edited Feb 21 '12 at 05:43

answered Feb 21 '12 at 05:30

Li-aung Yip

12,320
5
34
49

Thanks! I was actually trying to do something fairly similar, and I think the reason why it failed was I used .append() where you've used .extend(). Your answer is also a bit more comprehendable (to me, at least) than some of the solutions in the other thread. – Marius Feb 21 '12 at 05:40
Yes, `list.append(x)` adds a single item `x` to the end of the list. `list.extend(x)` "unpacks" `x`, adding each of its elements. Note that `extend` will actually work for any iterable type, including lists, dicts, and generators, as well as anything with an `__iter__()` method. – Li-aung Yip Feb 21 '12 at 05:46
Brushed up on my compsci. To be precise, loop + stack = recursion. You could do it depth-first or breadth first. – Li-aung Yip Feb 23 '12 at 05:20

Extracting strings from nested lists in Python

1 Answers1

Linked