10

What's the recommended way to flatten nested lists since the deprecation of the compiler package?

>>> from compiler.ast import flatten
>>> flatten(["junk",["nested stuff"],[],[[]]])
['junk', 'nested stuff']

I know that there are a few stack overflow answers for list flattening, but I'm hoping for the pythonic, standard package, "one, and preferably only one, obvious way" to do this.

Mittenchops
  • 18,633
  • 33
  • 128
  • 246

7 Answers7

12

itertools.chain is the best solution for flattening any nested iterable one level - it's highly efficient compared to any pure-python solution.

That said, it will work on all iterables, so some checking is required if you want to avoid it flattening out strings, for example.

Likewise, it won't magically flatten out to an arbitrary depth. That said, generally, such a generic solution isn't required - instead it's best to keep your data structured so that it doesn't require flattening in that way.

Edit: I would argue that if one had to do arbitrary flattening, this is the best way:

import collections

def flatten(iterable):
    for el in iterable:
        if isinstance(el, collections.Iterable) and not isinstance(el, str): 
            yield from flatten(el)
        else:
            yield el

Remember to use basestring in 2.x over str, and for subel in flatten(el): yield el instead of yield from flatten(el) pre-3.3.

As noted in the comments, I would argue this is the nuclear option, and is likely to cause more problems than it solves. Instead, the best idea is to make your output more regular (output that contains one item still give it as a one item tuple, for example), and do regular flattening by one level where it is introduced, rather than all at the end.

This will produce more logical, readable, and easier to work with code. Naturally, there are cases where you need to do this kind of flattening (if the data is coming from somewhere you can't mess with, so you have no option but to take it in the poorly-structured format), in which case, this kind of solution might be needed, but in general, it's probably a bad idea.

Gareth Latty
  • 86,389
  • 17
  • 178
  • 183
  • Thanks. This may be a silly clarification, but do you have a recommendation for how to uniterate it? `>>> itertools.chain([["junk",["nested stuff"],[],[[]]]]) ` I usually use list(iterable) to uniterate it, but if I do that in this case, I end up with a nonflat list! – Mittenchops Apr 23 '13 at 18:42
  • *uniterate*? I'm not really sure what that is means to mean. `list(iterable)` is the best way to get a list from an arbitrary iterable. – Gareth Latty Apr 23 '13 at 18:43
  • Using list on the above, I get: `>>> list(itertools.chain([["junk",["nested stuff"],[],[[]]]]))` `[['junk', ['nested stuff'], [], [[]]]]` versus `flatten()` which gives `['junk', 'nested stuff']` – Mittenchops Apr 23 '13 at 18:44
  • 1
    `chain()` takes an iterable as each argument, if you want to pass in an iterable, use `chain.from_iterable()`. Note that it will (as stated in the answer) only remove one layer of nesting. Beyond that, it's not really worth making a generic tool, as different situations will treat different iterables as needing to be flattened (for example, strings). – Gareth Latty Apr 23 '13 at 18:46
  • Sorry, I'm maybe still not getting it: `>>> list(itertools.chain.from_iterable([["junk",["nested stuff"],[],[[]]]]))` gives me `['junk', ['nested stuff'], [], [[]]]` – Mittenchops Apr 23 '13 at 18:48
  • @Mittenchops Yes, it has removed one layer of nesting - the outer list (which happened to only contain one item). – Gareth Latty Apr 23 '13 at 18:48
  • Hmm, but if I try to unnest that by flattening again, I get this: `>>> list(itertools.chain.from_iterable(itertools.chain.from_iterable([["junk",["nested stuff"],[],[[]]]])))` `['j', 'u', 'n', 'k', 'nested stuff', []]` which is still not flat and also very hard to read. =/ – Mittenchops Apr 23 '13 at 18:54
  • @Mittenchops I did explain in the answer that it would flatten out strings (as they are iterable too), and not work to an arbitrary depth. In general, flattening out arbitrarily nested lists is the nuclear option, and is likely to cause more problems than it solves. Instead, the best idea is to make your output more regular (output that contains one item still give it as a one item tuple, for example), and do regular flattening by one level where it is introduced, rather than all at the end. – Gareth Latty Apr 23 '13 at 18:58
6

Your stated function takes a nested list and flattens that into a new list.

To flatten an arbitrarily nested list into a new list, this works on Python 3 as you expect:

import collections
def flatten(x):
    result = []
    for el in x:
        if isinstance(x, collections.Iterable) and not isinstance(el, str):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

print(flatten(["junk",["nested stuff"],[],[[]]]))  

Prints:

['junk', 'nested stuff']

If you want a generator that does the same thing:

def flat_gen(x):
    def iselement(e):
        return not(isinstance(e, collections.Iterable) and not isinstance(e, str))
    for el in x:
        if iselement(el):
            yield el
        else:
            for sub in flat_gen(el): yield sub

print(list(flat_gen(["junk",["nested stuff"],[],[[[],['deep']]]]))) 
# ['junk', 'nested stuff', 'deep']

For Python 3.3 and later, use yield from instead of the loop:

def flat_gen(x):
    def iselement(e):
        return not(isinstance(e, collections.Iterable) and not isinstance(e, str))
    for el in x:
        if iselement(el):
            yield el
        else:
            yield from flat_gen(el)   
dawg
  • 98,345
  • 23
  • 131
  • 206
  • Building a list like this is a really bad solution, a generator would be far more apt. – Gareth Latty Apr 23 '13 at 18:50
  • I would say 'really bad solution' is a little strong. If the nested list is relatively small, building a generator might be slower. It depends on the need. I posted both. – dawg Apr 23 '13 at 19:33
  • It may well be slower to create a generator on a small list, but when it's small, the time difference isn't going to matter. When it's large, however, it may well do. It's best to just use the generator regardless, and it's more readable. Beyond that, `hasattr(e, "__iter__")` is generally a check better done by using the `Iterable` ABC - it is more readable that way. – Gareth Latty Apr 23 '13 at 19:38
4

You can use flatten function from funcy library:

from funcy import flatten, isa
flat_list = flatten(your_list)

You can also explicitly specify which values to follow:

# Follow only sets
flat_list = flatten(your_list, follow=isa(set))

Take a peek at its implementation if you want an algorythm.

Suor
  • 2,845
  • 1
  • 22
  • 28
1

There's no built-in method for an list with arbitrary nesting, but something like this...

def flatten(l):
    for i in l:
        if isinstance(i, (list, tuple)):
            for ii in flatten(i):
                yield ii
        else:
            yield i

>>> l = ["junk",["nested stuff"],[],[[]]]
>>> list(flatten(l))
['junk', 'nested stuff']

...will work for lists and tuples. If you want to support any object which you could use in a for item in object expression, then it's probably best to using duck-typing like this...

def flatten(l):
    for i in l:
        if isinstance(i, (str, bytes)):
            yield i
        else:
            try:
                for ii in flatten(i):
                    yield ii
            except TypeError:
                yield i

>>> l = ["junk",["nested stuff"],[],[[]]]
>>> list(flatten(l))
['junk', 'nested stuff']

...which is slightly more robust than checking isinstance(el, Iterable), since it won't cope with some cases, such as this...

class Range10:
    def __getitem__(self, index):
        if index >= 10:
            raise IndexError
        return index

>>> import collections
>>> r10 = Range10()
>>> isinstance(r10, collections.Iterable)
False
>>> list(Range10())
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Aya
  • 39,884
  • 6
  • 55
  • 55
  • -1 - Type checking against lists and tuples makes this extremely inflexible. – Gareth Latty Apr 23 '13 at 18:50
  • @Lattyware The question was: "What's the recommended way to flatten nested lists..." – Aya Apr 23 '13 at 18:51
  • @Lattyware Any comments on the second option? – Aya Apr 23 '13 at 19:00
  • Sorry, I was answering on my own question. My problem is that Python is a flexible language and type checking is just a generally bad solution. The asker may think only of lists, but then at some point use some custom data structure that functions like a list, but won't work with your code. Python is designed for duck typing, and so functions that only work with certain types, despite other types being capable are very annoying to work with, and generally a bad idea. – Gareth Latty Apr 23 '13 at 19:01
  • As to the second method, It's not terrible (it's very similar to the option I gave, except takes the `try/except` over the check beforehand - they are very similar, if the to-be-flattened list has a high proportion single items or a high proportion of nested iterables, one might be preferable, but it's unlikely to matter. As I state in my answer, I feel that the best solution is to *avoid a need for such a function in the first place*. – Gareth Latty Apr 23 '13 at 19:07
  • @Lattyware Well, I've noted a drawback of your option, but it's kind of an edge-case. Still, it seems a tad harsh to downvote for the reason you gave, particularly as it no longer applies. – Aya Apr 23 '13 at 19:13
  • Check your vote count, I removed that downvote as soon as you added the alternative. As to the edge case, yes, there is the slight potential, but at the same time, the ABC for iterables is clear they should implement `__iter__()`, so I would argue that's more a case of the class being poorly design if it's meant to be iterable - I've seen cases where `__getitem__()` has been defined where iterating over the values wouldn't make sense. It's a rare case, but I would argue it's hard to call what the behaviour should be on an object like that. – Gareth Latty Apr 23 '13 at 19:19
  • @Lattyware I recall having a similar discussion with Martijn Pieters recently, who suggested using `isinstance(el, Sequence)` to handle the `___getitem__()` case, although it was never clear if that implied that `Sequence` was a strict superset of `Iterable` or if it was required to test for both possibilities. – Aya Apr 23 '13 at 19:26
  • 1
    I'm pretty sure `Sequence` requires `__len__()` to be defined, and probably tests it's `Iterable` too. – Gareth Latty Apr 23 '13 at 19:29
  • @Lattyware Looks like you're right. I think the point I'm trying to make is the distinction between the term "iterable" and "any object that could possibly be used in a `for item in object` expression". So if, as you say, "Python is designed for duck typing" then none of the `collections` classes are going to help, and the only real way to be sure is to try the `for` loop, and catch the exception if it fails. – Aya Apr 23 '13 at 19:49
1

My ugly while-chain solution, just for fun:

from collections import Iterable
from itertools import chain

def flatten3(seq, exclude=(str,)):
    sub = iter(seq)
    try:
        while sub:
            while True:
                j = next(sub)
                if not isinstance(j, Iterable) or isinstance(j, exclude):
                    yield j
                else:
                    sub = chain(j, sub)
                    break
    except StopIteration:
        return
kalgasnik
  • 3,149
  • 21
  • 19
0

Built for recursion: Python 3.6

def flatten(lst):
    """Flattens a list of lists"""
    return [subelem for elem in lst 
                    for subelem in elem]

Define your types in a list and use the any builtin to check

0

Just another recursion: Python 3.6 . Works fine for complex nested list of lists/strings/digits and combination of all:

def list_flattener(nestedList):  
    nonListElems=[]
    listElems=[]
    nestedListCounter=0
    for nc in range(len(nestedList)):
        if type(nestedList[nc])==list:
            nestedListCounter+=1
            listElems=nestedList[nc]+listElems
        else:nonListElems.append(nestedList[nc])  
    if nestedListCounter==0: return (nestedList)
    else:
        nestedList=listElems+nonListElems 
        return list_flattener(nestedList)

See the example below :

>>> nestedList=['arash',[[1,'anotherOne',[12,'stringX',['kave']]], 'stringY']]
>>> list_flattener(nestedList)
['kave', 12, 'stringX', 1, 'anotherOne', 'stringY', 'arash']
Arash
  • 1,014
  • 1
  • 9
  • 17