482

Is the a short syntax for joining a list of lists into a single list( or iterator) in python?

For example I have a list as follows and I want to iterate over a,b and c.

x = [["a","b"], ["c"]]

The best I can come up with is as follows.

result = []
[ result.extend(el) for el in x] 

for el in result:
  print el
Kozyarchuk
  • 21,049
  • 14
  • 40
  • 46
  • Duplicate: http://stackoverflow.com/questions/120886/python-idiom-to-chain-flatten-an-infinite-iterable-of-finite-iterables, http://stackoverflow.com/questions/406121/flattening-a-shallow-list-in-python – S.Lott Apr 04 '09 at 11:52

15 Answers15

694
import itertools
a = [['a','b'], ['c']]
print(list(itertools.chain.from_iterable(a)))

This gives

['a', 'b', 'c']
Stefan
  • 1,697
  • 15
  • 31
CTT
  • 16,901
  • 6
  • 41
  • 37
  • 12
    no need to list() it! for item in itertools.chain(*a): do somethign with item – hasen Apr 04 '09 at 08:42
  • 21
    A bit of explanation would also be nice. http://docs.python.org/library/itertools.html#itertools.chain – hasen Apr 04 '09 at 08:43
  • 51
    result = []; map(result.extend, a) is ~30% faster than itertools.chain. But chain.from_iterable is a tiny bit faster than map+extend. [Python 2.7, x86_64] – temoto Jun 20 '11 at 02:23
  • 5
    This explains what's happening with the `*a`: http://stackoverflow.com/questions/5239856/foggy-on-asterisk-in-python (it sends the elements of `a` as arguments to `chain`, like removing the outer `[` and `]`). – Evgeni Sergeev Jan 09 '14 at 06:00
  • 2
    chain.from_iterable is significantly faster if you have many iterables to concatenate. For me it was ~50% faster when creating ctypes arrays of OpenGL vertices from 100s of python lists containing 10s or 100s of vertices each. The '*' operator converts your iterable into an intermediate tuple that it passes in to chain. – Jonathan Hartley Jul 25 '14 at 18:59
  • @CTT Can I do like this: `a = ['a','b';'c']` using `itertools` ? – diffracteD Jul 21 '16 at 11:14
  • 1
    This doesn't work if some elements are just strings: [["aaa", "bbb"], "ccc"] gives ["aaa", "bbb", "c", "c", "c"]. – prewett Feb 21 '17 at 02:10
  • 1
    @prewett This appears to work as intended; it always flattens exactly one level. `[["aaa", "bbb"], "ccc"]` becomes `["aaa", "bbb", "c", "c", "c"]`, and `[["aaa", "bbb"], ["ccc"]]` becomes `["aaa", "bbb", "ccc"]`. If you want to flatten an unknown and varying number of levels, you should go for a more complex solution using `isinstance(, list)` – Stef Aug 28 '21 at 22:37
  • @hasen This returns type of `itertools.chain` and you would need to convert to list depending on what you doing as `itertools.chain` doesnt work with everything – West Mar 31 '22 at 22:38
347
x = [["a","b"], ["c"]]

result = sum(x, [])
Christian C. Salvadó
  • 807,428
  • 183
  • 922
  • 838
  • 4
    @Aaron, explain for a noob python learner please: Is O(n^2) good or bad in this case? ;-) – Aufwind Jul 11 '11 at 16:31
  • 18
    O(n^2) here basically means that the time required for this function to execute is proportional to the square of the length of the inputs. So if you double the inputs, you quadruple the time required. This is a Bad Thing if you have large inputs, but for small ones it should be fine. But a faster method will be better. – andronikus Aug 29 '11 at 20:27
  • No, this is linear in the total number of items in all sublists combined, just like all other methods suggested on this page. – Julian Feb 26 '14 at 23:38
  • 4
    @Julian: You are wrong. Just time it, or see http://stackoverflow.com/a/952952/279627. – Sven Marnach Feb 28 '14 at 16:00
  • 1
    Very slick, but note that the python docs for `sum()` suggest using `itertools.chain` for this specific use case. https://docs.python.org/3/library/functions.html#sum – ACK_stoverflow May 10 '17 at 00:39
  • 2
    @ACK_stoverflow The docs say to use `chain` for combining iterables, not lists. If you just need an iterable as the result, you can do that and avoid extra copying. If you still need a list as the result, I doubt `chain` is any better. – sudo Dec 06 '17 at 21:44
  • 6
    Can somebody explain how and why `sum(x, [])` works? – Requin Feb 07 '21 at 09:29
  • 4
    What an awful thing to read! The more I use python the more it's half finished state hits home. I can't imagine why the standard library doesn't have a flat() function built in. – Robert Moskal Aug 17 '21 at 18:20
  • 1
    @Requin `sum(iterable, start)` adds all elements in `iterable` to `start`. Adding lists concats them. `start` defaults to 0, which would break if `interable` consists of lists as you can't add list to an int, so we pass an empty list as start. – andyhasit Feb 23 '23 at 15:05
265

If you're only going one level deep, a nested comprehension will also work:

>>> x = [["a","b"], ["c"]]
>>> [inner
...     for outer in x
...         for inner in outer]
['a', 'b', 'c']

On one line, that becomes:

>>> [j for i in x for j in i]
['a', 'b', 'c']
George V. Reilly
  • 15,885
  • 7
  • 43
  • 38
  • 2
    Very cool, so for the next depth-level it will become [i for ll in x for l in ll for i in l] - at this point it starts getting a bit lame for the reader, but nevertheless cool :) – khosrow Apr 10 '12 at 02:41
  • For three levels, it gets nasty: >>> x = [[["a", "b"], ["c"]], [["d"]]] >>> [k for i in x for j in i for k in j] ['a', 'b', 'c', 'd'] – George V. Reilly Apr 17 '12 at 23:02
  • 3
    Listception.. this is definitely unpythonic / against the zen of python in that it is not the simplest or most explicit way to do it. You end up hard coding recursion. Still cool though. – Zach Estela Mar 07 '18 at 19:02
  • 4
    @ZachEstela, I'm happy to see someone call this unpythonic. It seems like many techniques others like to call pythonic are not easily understood at first glance. Readability is one of the things that makes Python attractive to me. This solution is cool, and probably the fastest, but the `sum(x, [])` solution is much more Pythonic. – Mr. Lance E Sloan Jul 15 '18 at 17:49
  • 3
    Those "more pythonic" answers are just wrong. The question wasn't about recursive joining, but joining a list of lists, which means there are no more depth levels to join. – tobltobs Sep 12 '18 at 12:44
49
flat_list = []
map(flat_list.extend, list_of_lists)

shortest!

Skippy le Grand Gourou
  • 6,976
  • 4
  • 60
  • 76
nate c
  • 8,802
  • 2
  • 27
  • 28
  • 25
    sum(listoflists,[]) # shorter! – recursive Jul 24 '10 at 01:18
  • 12
    @recursive Shorter but different functionally = much worse performance-wise, see comments on other variants for explanation – Kos Dec 11 '12 at 07:59
  • This tiny snippet appears to be the fastest way around for non-recursive flatten. Needs more upvotes. – Kos Dec 11 '12 at 08:34
  • 12
    in Python 3.1+, wrap `map` with`list()`, or else you'll see `` when you print the result – kit May 21 '16 at 15:54
  • @kit This is due to `map` returning an iterator in Python 3. BTW, it seems that "using `map` for its side effects" is not a consensual method (see e.g. [this answer](https://stackoverflow.com/questions/18433488/non-lazy-evaluation-version-of-map-in-python3)). – Skippy le Grand Gourou Oct 18 '22 at 16:46
  • This solution does not work for me. – habrewning Jun 18 '23 at 14:31
33

This is known as flattening, and there are a LOT of implementations out there.

How about this, although it will only work for 1 level deep nesting:

>>> x = [["a","b"], ["c"]]
>>> for el in sum(x, []):
...     print el
...
a
b
c

From those links, apparently the most complete-fast-elegant-etc implementation is the following:

def flatten(l, ltypes=(list, tuple)):
    ltype = type(l)
    l = list(l)
    i = 0
    while i < len(l):
        while isinstance(l[i], ltypes):
            if not l[i]:
                l.pop(i)
                i -= 1
                break
            else:
                l[i:i + 1] = l[i]
        i += 1
    return ltype(l)
Machavity
  • 30,841
  • 27
  • 92
  • 100
Paolo Bergantino
  • 480,997
  • 81
  • 517
  • 436
  • 5
    Ah, 'sum(L,I)' is shorthand for 'reduce(plus_operator, L, I)'. That's kinda cool. – Aaron Apr 04 '09 at 04:18
  • 5
    your "most complete-elegant-etc" is not "elegant" at all!! see the docs for itertools.chain to see true elegance! – hasen Apr 04 '09 at 09:04
  • 4
    @hasen j: I believe he means best for arbitrary nested lists. chain assumes a consistent, one-deep list of lists (which is probably all the question needs), but flatten handles things like [a,b,[c], [d,[e,f]],[[[g]]]]. – Brian Apr 04 '09 at 09:44
  • Unfortunately this breaks if you're using pylab, because numpy's `sum` gets imported into the global namespace, and that function doesn't work that way. – naught101 Mar 25 '19 at 23:44
30

If you need a list, not a generator, use list():

from itertools import chain
x = [["a","b"], ["c"]]
y = list(chain(*x))
culebrón
  • 34,265
  • 20
  • 72
  • 110
27

A performance comparison:

import itertools
import timeit
big_list = [[0]*1000 for i in range(1000)]
timeit.repeat(lambda: list(itertools.chain.from_iterable(big_list)), number=100)
timeit.repeat(lambda: list(itertools.chain(*big_list)), number=100)
timeit.repeat(lambda: (lambda b: map(b.extend, big_list))([]), number=100)
timeit.repeat(lambda: [el for list_ in big_list for el in list_], number=100)
[100*x for x in timeit.repeat(lambda: sum(big_list, []), number=1)]

Producing:

>>> import itertools
>>> import timeit
>>> big_list = [[0]*1000 for i in range(1000)]
>>> timeit.repeat(lambda: list(itertools.chain.from_iterable(big_list)), number=100)
[3.016212113769325, 3.0148865239060227, 3.0126415732791028]
>>> timeit.repeat(lambda: list(itertools.chain(*big_list)), number=100)
[3.019953987082083, 3.528754223385439, 3.02181439266457]
>>> timeit.repeat(lambda: (lambda b: map(b.extend, big_list))([]), number=100)
[1.812084445152557, 1.7702404451095965, 1.7722977998725362]
>>> timeit.repeat(lambda: [el for list_ in big_list for el in list_], number=100)
[5.409658160700605, 5.477502077679354, 5.444318360412744]
>>> [100*x for x in timeit.repeat(lambda: sum(big_list, []), number=1)]
[399.27587954973444, 400.9240571138051, 403.7521153804846]

This is with Python 2.7.1 on Windows XP 32-bit, but @temoto in the comments above got from_iterable to be faster than map+extend, so it's quite platform and input dependent.

Stay away from sum(big_list, [])

Evgeni Sergeev
  • 22,495
  • 17
  • 107
  • 124
  • Super helpful. Thanks! Note that in Python3, we need a list() around the map() version, otherwise the results are too good to be true. – Michael Scott Asato Cuthbert Nov 30 '16 at 19:25
  • 3
    There's a few downvotes. I can't figure out what they're referring to. If you see a mistake, could you point it out? If there is a mistake, it should be easy to fix, which would be nice for the future generations of visitors. – Evgeni Sergeev May 31 '17 at 12:00
16

This works recursively for infinitely nested elements:

def iterFlatten(root):
    if isinstance(root, (list, tuple)):
        for element in root:
            for e in iterFlatten(element):
                yield e
    else:
        yield root

Result:

>>> b = [["a", ("b", "c")], "d"]
>>> list(iterFlatten(b))
['a', 'b', 'c', 'd']
Adam Harte
  • 10,369
  • 7
  • 52
  • 85
Georg Schölly
  • 124,188
  • 49
  • 220
  • 267
  • 1
    `>>> a = [] >>> a.append(a) >>> b = iterFlatten(a) >>> next(b) RuntimeError: maximum recursion depth exceeded in __instancecheck__` :) – Casey Kuball Mar 22 '12 at 16:39
  • 2
    @Darthfett would you expect a meaningful result for flattening an "infinitely-nested list"? :-) – Kos Dec 11 '12 at 07:46
  • @Kos A version that checks for such cases (by using a stack/set to check for self-references in a list) could be preferable than simply continuing to flatten until reaching the recursion depth limit. This could bypass the issue by simply giving the value, instead of trying to flatten it. – Casey Kuball Dec 11 '12 at 22:28
8

Late to the party but ...

I'm new to python and come from a lisp background. This is what I came up with (check out the var names for lulz):

def flatten(lst):
    if lst:
        car,*cdr=lst
        if isinstance(car,(list,tuple)):
            if cdr: return flatten(car) + flatten(cdr)
            return flatten(car)
        if cdr: return [car] + flatten(cdr)
        return [car]

Seems to work. Test:

flatten((1,2,3,(4,5,6,(7,8,(((1,2)))))))

returns:

[1, 2, 3, 4, 5, 6, 7, 8, 1, 2]
Michael Puckett
  • 465
  • 1
  • 5
  • 7
  • 15
    You come from a lisp background? I never would have guessed from the code... haha – Tom Nov 17 '11 at 16:20
  • Nice, been doing Python for some time now and I haven't seen var-arg tuple unpacking like you did with `car, *cdr`. (e-> probably because it's Python 3 and I'm still digging 2 for some reason :-)) – Kos Dec 11 '12 at 07:52
  • What's the point of the `if lst:`? – naught101 Mar 26 '19 at 00:57
  • its 4 a.m. and I was hoping to find something that goes deep like 3-4 list nested, thanks you wrote this since I am not able to code now :D. – Sadegh May 31 '23 at 00:05
5

What you're describing is known as flattening a list, and with this new knowledge you'll be able to find many solutions to this on Google (there is no built-in flatten method). Here is one of them, from http://www.daniel-lemire.com/blog/archives/2006/05/10/flattening-lists-in-python/:

def flatten(x):
    flat = True
    ans = []
    for i in x:
        if ( i.__class__ is list):
            ans = flatten(i)
        else:
            ans.append(i)
    return ans
Paige Ruten
  • 172,675
  • 36
  • 177
  • 197
  • This method works nicely for a mix of lists of strings and strings (e.g. `[['some', 'string'], 'and', 'another']` ), while the itertools techniques do not. This works well for my needs. – James Haskell Aug 11 '16 at 18:30
  • this method does not work when you have something like this `[[[a, b, c], [d, e, f]], [[g, h, l]]]` – Sadegh May 31 '23 at 00:03
3

There's always reduce (being deprecated to functools):

>>> x = [ [ 'a', 'b'], ['c'] ]
>>> for el in reduce(lambda a,b: a+b, x, []):
...  print el
...
__main__:1: DeprecationWarning: reduce() not supported in 3.x; use functools.reduce()
a
b
c
>>> import functools
>>> for el in functools.reduce(lambda a,b: a+b, x, []):
...   print el
...
a
b
c
>>>

Unfortunately the plus operator for list concatenation can't be used as a function -- or fortunate, if you prefer lambdas to be ugly for improved visibility.

Aaron
  • 3,454
  • 23
  • 26
  • 3
    GAH, I cannot believe they are deprecating it to functools. Anyway, you don't need the extra empty list, this will work just fine: reduce(lambda a,b: a+b, x) – Benson Apr 04 '09 at 06:23
  • 2
    Versions of the operators are defined as functions in the operator module, which are faster and less ugly than the lambda: "functools.reduce(operator.add, [[1,2,3],[4,5]],[])". Alternatively, just use sum() – Brian Apr 04 '09 at 09:40
  • Personally, I think the lambda way is quite pretty. :-) – Benson Apr 16 '09 at 01:58
  • If you want to do a reduce, then reduce over `extend` not `add` to avoid spamming the memory with temporary lists. Wrap `extend` with a function that extends then returns the list itself. – Kos Dec 11 '12 at 07:58
3

For one-level flatten, if you care about speed, this is faster than any of the previous answers under all conditions I tried. (That is, if you need the result as a list. If you only need to iterate through it on the fly then the chain example is probably better.) It works by pre-allocating a list of the final size and copying the parts in by slice (which is a lower-level block copy than any of the iterator methods):

def join(a):
    """Joins a sequence of sequences into a single sequence.  (One-level flattening.)
    E.g., join([(1,2,3), [4, 5], [6, (7, 8, 9), 10]]) = [1,2,3,4,5,6,(7,8,9),10]
    This is very efficient, especially when the subsequences are long.
    """
    n = sum([len(b) for b in a])
    l = [None]*n
    i = 0
    for b in a:
        j = i+len(b)
        l[i:j] = b
        i = j
    return l

Sorted times list with comments:

[(0.5391559600830078, 'flatten4b'), # join() above. 
(0.5400412082672119, 'flatten4c'), # Same, with sum(len(b) for b in a) 
(0.5419249534606934, 'flatten4a'), # Similar, using zip() 
(0.7351131439208984, 'flatten1b'), # list(itertools.chain.from_iterable(a)) 
(0.7472689151763916, 'flatten1'), # list(itertools.chain(*a)) 
(1.5468521118164062, 'flatten3'), # [i for j in a for i in j] 
(26.696547985076904, 'flatten2')] # sum(a, [])
esmit
  • 1,748
  • 14
  • 27
Brandyn
  • 430
  • 3
  • 9
  • Can you add timings to confirm that this is faster than the other methods presented? – esmit Jun 27 '12 at 22:46
  • Sorted times list with comments: `[(0.5391559600830078, 'flatten4b'), # join() above. (0.5400412082672119, 'flatten4c'), # Same, with sum(len(b) for b in a) (0.5419249534606934, 'flatten4a'), # Similar, using zip() (0.7351131439208984, 'flatten1b'), # list(itertools.chain.from_iterable(a)) (0.7472689151763916, 'flatten1'), # list(itertools.chain(*a)) (1.5468521118164062, 'flatten3'), # [i for j in a for i in j] (26.696547985076904, 'flatten2')] # sum(a, []) ` – Brandyn Jun 28 '12 at 09:09
  • You skipped `map(result.extend, a)` – Kos Dec 11 '12 at 08:00
  • There's a benchmark http://ideone.com/9q3mrp – Kos Dec 11 '12 at 08:35
  • @Kos, you are right! I'm lame. I probably omitted it originally because it "obviously" has bad O() time due to multiple copies, but now that I add it to my test, in practice it looks like it is successfully using realloc() to avoid this, and so it is winning hands down under all conditions. I remain skeptical, though, that it might revert to horrible behavior in a real working environment with fragmented memory. In a simple test app like this, with a clean slate of memory, it is free to keep extending the array without moving it. Thoughts? – Brandyn Dec 11 '12 at 20:10
2

Or a recursive operation:

def flatten(input):
    ret = []
    if not isinstance(input, (list, tuple)):
        return [input]
    for i in input:
        if isinstance(i, (list, tuple)):
            ret.extend(flatten(i))
        else:
            ret.append(i)
    return ret
svhb
  • 21
  • 1
1

I had a similar problem when I had to create a dictionary that contained the elements of an array and their count. The answer is relevant because, I flatten a list of lists, get the elements I need and then do a group and count. I used Python's map function to produce a tuple of element and it's count and groupby over the array. Note that the groupby takes the array element itself as the keyfunc. As a relatively new Python coder, I find it to me more easier to comprehend, while being Pythonic as well.

Before I discuss the code, here is a sample of data I had to flatten first:

{ "_id" : ObjectId("4fe3a90783157d765d000011"), "status" : [ "opencalais" ],
  "content_length" : 688, "open_calais_extract" : { "entities" : [
  {"type" :"Person","name" : "Iman Samdura","rel_score" : 0.223 }, 
  {"type" : "Company",  "name" : "Associated Press",    "rel_score" : 0.321 },          
  {"type" : "Country",  "name" : "Indonesia",   "rel_score" : 0.321 }, ... ]},
  "title" : "Indonesia Police Arrest Bali Bomb Planner", "time" : "06:42  ET",         
  "filename" : "021121bn.01", "month" : "November", "utctime" : 1037836800,
  "date" : "November 21, 2002", "news_type" : "bn", "day" : "21" }

It is a query result from Mongo. The code below flattens a collection of such lists.

def flatten_list(items):
  return sorted([entity['name'] for entity in [entities for sublist in  
   [item['open_calais_extract']['entities'] for item in items] 
   for entities in sublist])

First, I would extract all the "entities" collection, and then for each entities collection, iterate over the dictionary and extract the name attribute.

1

Sadly, Python doesn't have a simple way to flatten lists. Try this:

def flatten(some_list):
    for element in some_list:
        if type(element) in (tuple, list):
            for item in flatten(element):
                yield item
        else:
            yield element

Which will recursively flatten a list; you can then do

result = []
[ result.extend(el) for el in x] 

for el in flatten(result):
      print el
Don Werve
  • 5,100
  • 2
  • 26
  • 32