64

What is fastest solution to flatten lists which contain other lists of arbitrary length?

For example [1, 2, [3, 4, [5],[]], [6]] would become [1,2,3,4,5,6].

There can be arbitrarily many levels. Some of the list objects can be strings, which mustn't be flattened into their sequential characters in the output list.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Ivy
  • 3,393
  • 11
  • 33
  • 46
  • 1
    [This answer](http://stackoverflow.com/a/406822/197011) from a [previously asked question](http://stackoverflow.com/questions/406121/flattening-a-shallow-list-in-python) should do what you want. (Voting to close as duplicate). – GreenMatt May 30 '12 at 20:43
  • 4
    @GreenMatt, it's only a duplicate if the *question* is the same, not the answer. It's clear that the other question is for a single level of nesting, and simpler solutions are more appropriate. – Mark Ransom May 30 '12 at 20:52
  • @Mark Ransom: I thought I'd seen this before, and that was what I found when I searched. Guess I didn't read the question carefully enough. – GreenMatt May 30 '12 at 21:15
  • Please stop removing the system-generated "Possible duplicate" banner. The second link has the answer to your question. – vaultah Apr 24 '16 at 20:29
  • 1
    Disagree, the answer might happen to be the same, but the question needs to work for arbitrarily nested lists and so isn't a duplicate question. – Ivy Apr 25 '16 at 12:03
  • @Rory I'm sorry, even the questions look (almost) identical to me. Can you highlight the difference? Also, what does removing the duplicate text solve? – vaultah Apr 25 '16 at 13:34
  • 1
    I am surprised that this is not available in a standard library, like itertools or smthing – a06e Nov 22 '17 at 15:14
  • @vaultah the apparent difference was that this question *supposedly* asks about performance. However, performance considerations would belong to the consideration of the original question anyway, and *the accepted answer here explicitly disclaims performance*, so this question is a crystal clear duplicate. – Karl Knechtel Sep 06 '22 at 05:12

3 Answers3

89

Here's a recursive approach that is string friendly:

nests = [1, 2, [3, 4, [5],['hi']], [6, [[[7, 'hello']]]]]

def flatten(container):
    for i in container:
        if isinstance(i, (list,tuple)):
            for j in flatten(i):
                yield j
        else:
            yield i

print list(flatten(nests))

returns:

[1, 2, 3, 4, 5, 'hi', 6, 7, 'hello']

Note, this doesn't make any guarantees for speed or overhead use, but illustrates a recursive solution that hopefully will be helpful.

hexparrot
  • 3,399
  • 1
  • 24
  • 33
  • 7
    This is just a beautiful application of yield. Well done ... – fabee Sep 13 '13 at 15:29
  • 12
    In Python 3.3+, you could use `yield from flatten(i)` instead of `for j in flatten(i): yield j` – jfs Feb 24 '15 at 10:38
  • 2
    you could also do `if isinstance(i, (list, tuple)):` instead. but great solution regardless. – acushner Mar 16 '16 at 22:39
  • What if it's a set? I used `if hasattr(i, '__iter__'):` – TayTay May 26 '16 at 14:07
  • 2
    String types cause an infinite recursion with `hastattr(i, '__iter__')`. I filtered those out `if hasattr(i, '__iter__') and not isinstance(i, str):` – random8 Mar 01 '17 at 18:03
  • You seem to assume that `container` is a list. You'll get a list of characters if you call `flatten('test')`. There's no need for a nested `for` loop. That's what recursion is for! – Eric Duminil Nov 22 '17 at 11:25
  • Recursive is not fast... – Jorge Orpinel Pérez Jan 22 '19 at 09:49
  • This will raise a `RecursionError` if the depth of your list is more than 1000. https://stackoverflow.com/questions/3323001/what-is-the-maximum-recursion-depth-in-python-and-how-to-increase-it – Boris Verkhovskiy Apr 24 '19 at 07:31
30

It doesn't have to be recursive. In fact, an iterative solution is often faster because of the overhead involved in function calls. Here's an iterative version I wrote a while back:

def flatten(items, seqtypes=(list, tuple)):
    for i, x in enumerate(items):
        while i < len(items) and isinstance(items[i], seqtypes):
            items[i:i+1] = items[i]
    return items

Haven't tested the performance of this specific implementation, but it is probably not so great because of all the slice assignments, which could end up moving a lot of memory around. Still, don't assume it has to be recursive, or that it's simpler to write it that way.

This implementation does have the advantage of flattening the list "in place" rather than returning a copy, as recursive solutions invariably do. This could be useful when memory is tight. If you want a flattened copy, just pass in a shallow copy of the list you want to flatten:

flatten(mylist)                # flattens existing list
newlist = flatten(mylist[:])   # makes a flattened copy

Also, this algorithm is not limited by the Python recursion limit because it's not recursive. I'm certain this will virtually never come into play, however.

2021 edit: it occurs to me that the check for the end of the list might be better handled with try/except because it will happen only once, and getting the test out of the main loop could provide a performance beneft. That would look like:

def flatten(items, seqtypes=(list, tuple)):
    try:
        for i, x in enumerate(items):
            while isinstance(items[i], seqtypes):    
                items[i:i+1] = items[i]
    except IndexError:
        pass
    return items

With some further tweaking to use the x returned by enumerate instead of accessing items[i] so much, you get this, which is either mildly or significantly faster than the original version up top, depending on the size and structure of your lists.

def flatten(items, seqtypes=(list, tuple)):
    try:
        for i, x in enumerate(items):
            while isinstance(x, seqtypes):    
                items[i:i+1] = x
                x = items[i]
    except IndexError:
        pass
    return items
kindall
  • 178,883
  • 35
  • 278
  • 309
  • I thought modifying a sequence you're looping over was kind of dangerous? – Mark Ransom May 30 '12 at 20:56
  • 1
    @Mark this is write-once code :). Make a note to warn future editors and move on with your life. – Kenan Banks May 30 '12 at 21:20
  • 4
    @Triptych, I don't quite understand your comment. – Mark Ransom May 30 '12 at 21:30
  • It's not necessarily "dangerous" to modify a list you're iterating, it's just that you can wind up skipping elements you don't mean to if you delete things. To avoid this, I use `enumerate()` and access the list only by index, rather than using the value (`x`) yielded by the iteration. – kindall May 30 '12 at 21:37
  • Thanks for the explanation kindall, using an index does make it safer. You're still relying on lazy evaluation of `enumerate` for proper results, and I doubt that's guaranteed behavior (can you tell I have a C++ background?) – Mark Ransom May 30 '12 at 22:04
  • 2
    `enumerate()` is dead-simple and is documented as being lazy (i.e., returning an iterator that yields sequential index values). The behavior of `iter()` on a list (using an incrementing index with `__getitem__()` and comparing it to `len()` on each iteration, which works even as the length changes) is also documented. So, although it kind of looks like a hack, it's actually pretty safe barring significant changes to the language. – kindall May 30 '12 at 22:45
  • 1
    range instead of enumerate will also do the trick. – Narayan Singh Jul 17 '13 at 04:29
  • In python usually the recursive option is more efficient than the iterative one. The only risk is hitting the recursion limit. – bgusach Oct 08 '14 at 15:05
  • You're right for this particular situation. Building a new list with a recursive solution is going to move memory around a lot less than my solution and this should more than make up for the overhead of function calls. – kindall Oct 08 '14 at 16:24
  • @kindall. Why the snippet below behaves differently than yours (python 3.4)? `def flatten(items, seqtypes=(list, tuple)): for i in range(len(items)): while isinstance(items[i], seqtypes): items[i:i+1] = items[i] return items` –  May 21 '15 at 07:13
  • I'm trying to flatten every item in a list, which can contain strings, integer, floats, and lists containing those elements. This solution does not work in my case--for example, floats are not iterable, so "for i, x in enumerate(items):" throws a TypeError if there are any lone floats in the list. Any suggestions? – Nolan Conaway Sep 30 '15 at 01:19
  • @nolanconaway The routine as written skips anything that's not a list or a tuple, so what you're saying happens can never happen. – kindall Feb 04 '16 at 18:45
  • 1
    I am surprised this does not have more votes. I ascribe it to the fact that you have to read carefully to understand what it is doing. – Mad Physicist Mar 17 '16 at 16:36
  • Thanks @MadPhysicist! – kindall Mar 17 '16 at 18:29
  • This won't work if there is an empty list involved and I'm not sure there is a nice way to change that - since it relies on modifying in-place + enumerate – user3467349 Jun 14 '16 at 10:35
  • @user3467349 What doesn't work about it if there's an empty list? What would you expect to happen that isn't? – kindall Jun 14 '16 at 16:13
  • @kindall e.g. `[[]]` should produce `[]` or `[[], 1]` should produce `[1]`. – user3467349 Jun 15 '16 at 06:15
  • @kindall To amend what I said `[[], 1]` works -- but `flatten([1, []])` does not and neither does `flatten([[]])`. – user3467349 Jun 16 '16 at 13:10
  • Fixed it, thanks for finding that. – kindall Jun 16 '16 at 20:14
  • The problem was when the empty list was the last item, it would delete the last item, and then on the next iteration it would try to check the last item to see if it was a sequence, but it wasn't there anymore. – kindall Jun 16 '16 at 20:22
10

This function should be able to quickly flatten nested, iterable containers without using any recursion:

import collections

def flatten(iterable):
    iterator = iter(iterable)
    array, stack = collections.deque(), collections.deque()
    while True:
        try:
            value = next(iterator)
        except StopIteration:
            if not stack:
                return tuple(array)
            iterator = stack.pop()
        else:
            if not isinstance(value, str) \
               and isinstance(value, collections.Iterable):
                stack.append(iterator)
                iterator = iter(value)
            else:
                array.append(value)

After about five years, my opinion on the matter has changed, and this may be even better to use:

def main():
    data = [1, 2, [3, 4, [5], []], [6]]
    print(list(flatten(data)))


def flatten(iterable):
    iterator, sentinel, stack = iter(iterable), object(), []
    while True:
        value = next(iterator, sentinel)
        if value is sentinel:
            if not stack:
                break
            iterator = stack.pop()
        elif isinstance(value, str):
            yield value
        else:
            try:
                new_iterator = iter(value)
            except TypeError:
                yield value
            else:
                stack.append(iterator)
                iterator = new_iterator


if __name__ == '__main__':
    main()
Noctis Skytower
  • 21,433
  • 16
  • 79
  • 117