Is there a functional way to do this?

Question

def flattenList(toFlatten):
 final=[]
 for el in toFlatten:
  if isinstance(el, list):
   final.extend(flattenList(el))
  else:
   final.append(el)
 return final

When I don't know how deeply the lists will nest, this is the only way I can think to do this.

Try using four spaces instead of one for your indentation. It is more readable and will conform to the style guidelines used for most Python code. (The style guide most Python code conforms to is http://www.python.org/dev/peps/pep-0008/) — Mike Graham, Mar 18 '10 at 18:52
Related questions: "Flatten (an irregular) list of lists in Python" http://stackoverflow.com/questions/2158395/flatten-an-irregular-list-of-lists-in-python (links to other questions and good answers) "Flattening a shallow list in python" http://stackoverflow.com/questions/406121/flattening-a-shallow-list-in-python (benchmarks) — jfs, Mar 18 '10 at 19:35

Mike Graham · Answer 1 · 2010-10-15T21:45:21.860

7

You should avoid typechecking in Python. In this case, this means avoiding arbitrarily-nested structures where you distinguish by type. You can build your own node type which you can traverse by methods other than typechecking, like looking at a specific attribute.
For flattening one level or exactly n levels, look at itertools.chain.from_iterable.
I don't know what you mean by "functional". This code is pretty functional: it uses recursion (not to its credit!) and it doesn't mutate its argument. (Strictly speaking, it does use mutable state for building a list, but that is just how you do it in Python.

I suppose one more functional attribute would be lazy evaluation. You could implement this thusly

def flatten(toFlatten):
    for item in toFlatten:
        if isinstance(item, list): # Ewww, typchecking
            for subitem in flatten(item): # they are considering adding 
                yield subitem             # "yield from" to the  language
                                          # to give this pattern syntax
        else:
            yield item

Recursion is very limited in Python (at least, in all its major implementations) and should generally be avoided for arbitrary depth. It is quite possible to rewrite this (and all recursive code) to use iteration, which will make this more scalable (and less functional, which is a good thing in Python, which is not especially suited for FP.)

edited Oct 15 '10 at 21:45

answered Mar 18 '10 at 16:55

Mike Graham

73,987
14
101
130

@Mike Graham: I want to be able to pass in lists containing lists containing lists, containing lists, etc., and flatten them all the way down to one single result. I want: [1,2,[3,4,5,6], 7,8,[9,10,[11,12,[13,[14,15],16],17],20]] To come out as: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 20] If I knew in advance how deeply the lists nested, this would suffice: def mergeLists(seq): return reduce(lambda x,y: x+y, seq) – Ishpeck Mar 18 '10 at 17:11
1) Stop wanting that. Define your structure differently. 2) Your `reduce` strategy is multiplicative in big-O order; the linear algorithms are obvious. – Mike Graham Mar 18 '10 at 17:24
A more general answer: http://stackoverflow.com/questions/2158395/flatten-an-irregular-list-of-lists-in-python/2158532#2158532 – jfs Mar 18 '10 at 19:38
I like that answer less than this bad answer. Typechecking sucks, but if you're using type to indicate data, it should be a *very specific type*. For example, I should be free to make my *own* string-ish type that iterates of length-1 sequences of itself and not subclass basestring. If I *was* going to use type to indicate this information, not only would I limit it to exactly `list`, I'd probably subclass `list` so that I could typecheck for *exactly* what I want. – Mike Graham Mar 18 '10 at 19:46
1

It's worth noting nested lists like [1,2,[3,4,5,6], 7,8,[9,10,[11,12,[13,[14,15],16],17],20]] are really trees: Node([Leaf(1),Leaf(2),Node([Leaf(3),Leaf(4)... etc. – sdcvvc Mar 18 '10 at 20:05

Mike Graham · Answer 2 · 2010-03-18T18:22:55.147

This answer explains why you do not want to use reduce for this in Python.

Consider the snippet

reduce(operator.add, [[1], [2], [3], [4], [5]])

What does this have to do?

[1] + [2] => [1, 2]
[1, 2] + [3] => This makes a new list, having to go over 1, then 2, then 3. [1, 2, 3]
[1, 2, 3] + [4] => This has to copy the 1, 2, and 3 and then put 4 in the new list
[1, 2, 3, 4] + [5] => The length of stuff I have to copy gets bigger each time!

This quadratic behavior is completely avoidable: the original solution (and any number of other solutions) does not form these intermediate copying steps.

score 2 · Answer 3 · answered Mar 18 '10 at 17:11

2

Under the doc for itertools, there's a flatten() function

answered Mar 18 '10 at 17:11

ghostdog74

327,991
56
259
343

Not on that page there isn't. – Andrew Aylett Mar 18 '10 at 17:14
1

@Andrew Aylett, It is a recipe, not in the module itself. It's on that page. – Mike Graham Mar 18 '10 at 17:55
@Mike: Admit it, you edited the documentation after I posted my comment. Not sure how I missed that -- it didn't come up when I searched the page at work :P. – Andrew Aylett Mar 18 '10 at 19:01

score 1 · Accepted Answer · answered Mar 18 '10 at 17:07

1

Here's another option (though there may be something cleaner than type-checking, like testing if something is iterable and hence not an "atom"):

def flatten(lst):
    if not isinstance(lst,list):
        return [lst]
    else:
        return reduce(lambda x,y:x+y,[flatten(x) for x in lst],[])

It's based on something scheme-like.

answered Mar 18 '10 at 17:07

mitmatt

681
5
3

1

`try: return reduce(...); except TypeError: return [lst]` – Debilski Mar 18 '10 at 17:14
@mitmatt, 1) The function `lambda x, y: x + y` is called `operator.add`. 2) Your algorithm is so unnecessarily O(n * m)ish, when the linear algorithm is more obvious. – Mike Graham Mar 18 '10 at 17:26
1

@Debilski, Normally exception handling would be better form, but recursive flattening is tricky. Try thinking about what will happen if you have a string in there! – Mike Graham Mar 18 '10 at 17:28
@Mike Graham: True. That wouldn’t be nice. – Debilski Mar 18 '10 at 17:30
@Mike, I'm not sure what your m and n refer to, but the code I provided is linear in the number of atoms in the deep-list argument: each atom is listified exactly once, and the reduce will only call + once for each of its recursively-flattened args (meaning only one append per atom). Calling operator.add means another import statement, but it's certainly a valid alternative! – mitmatt Mar 18 '10 at 17:45
@mitmatt, I posted an answer that hopefully explains the quadratic nature of your algorithm. – Mike Graham Mar 18 '10 at 17:55
Easier yet: replace `reduce(lambda x,y:x+y,` with `sum(`. – Debilski Mar 18 '10 at 17:55
@Mike, your post seems to have a mistake: concatenating linked lists doesn't require copying the elements, but rather just a single pointer operation. Each of those + operations is constant time. – mitmatt Mar 18 '10 at 18:03
@Debliski, This is still quadratic, which is completely unavoidable. Also, `sum` defaults to starting to sum with 0, which isn't appropriate in this case; you'd have to pass a `start` param. – Mike Graham Mar 18 '10 at 18:05
@mitmatt, Python lists **aren't** linked lists. They are mutable array-like objects that copy when you use `+`. – Mike Graham Mar 18 '10 at 18:06
1

@Mike Graham: The start param would be there, if following the replacement rule I gave. But I did not mean that it would generally be faster, it was just another option to calling `operator.add`. – Debilski Mar 18 '10 at 18:08
Though the `sum` is faster than `reduce` + `lambda`. – Debilski Mar 18 '10 at 18:09
@Mike, that's actually implementation-dependent and not specified by the Python language, but I think you'll find that the CPython implementation provides (amortized) constant-time appends: http://wiki.python.org/moin/TimeComplexity – mitmatt Mar 18 '10 at 18:13
@Deblinski, Using `reduce` or `sum` are both slow, because they are O(n^2) and the right algorithm to use is O(n) – Mike Graham Mar 18 '10 at 18:13
My mistake: the appropriate function in that reference is probably "extend", which does indeed require more operations. From a data structures perspective, that's not entirely necessary (e.g. [1]), but it does seem to be the case with CPython. [1] http://en.wikipedia.org/wiki/Deque#Implementations – mitmatt Mar 18 '10 at 18:19
@mittmatt, `collections.deque` is implemented using a doubly-linked list. `list`is implemented using a resizing array, and supports the appropriate operations (like O(1) indexing). In any event, even if list *was* a doubly linked list (which it isn't!), it isn't clear whether the right thing to do with concatenation is copying or not, if it is mutable. – Mike Graham Mar 18 '10 at 18:34
@mitmatt, anything that doesn't have algorithmic complexities identical to `list` in CPython for its list and that is trying to claim to be Python should have its developers shot. – Mike Graham Mar 18 '10 at 18:38

Is there a functional way to do this?

4 Answers4