0

I've got a list such as [[1,2], [3,4], [5,6], [7,8], [9,10]] . I want to get [1,2,3,4,5,6,7,8,9,10].

This question gives some very good options for flattening lists in general. The answers given there work with variable length sublists. Here though, I know that every sublist has the same length (in particular length 2).

I'm wondering if it is possible to take advantage of the homogeneous sublist length to improve on the answers given in the question I linked to. In particular, is there anything that will do better at flattening this list than [item for sublist in l for item in sublist] ?

edit: by 'better', I mean faster for a very long list.

edit:

One thing I did not mention - I do not care about the order of the flattened list (but I care about multiplicity)

import timeit
import itertools
def f0():
    l=[[1,2]]*99
    [item for sublist in l for item in sublist]
def f1():
    l=[[1,2]]*99
    list(itertools.chain.from_iterable(l))
def f2():
    l = [[1,2]]*99
    z = map(list,zip(*l))
    z[0].extend(z[1])

print timeit.timeit("f0()", setup="from __main__ import f0, f1, f2", number=10000)
print timeit.timeit("f1()", setup="from __main__ import f0, f1, f2", number=10000)
print timeit.timeit("f2()", setup="from __main__ import f0, f1, f2", number=10000)

yields the output

0.13874912262
0.103307008743
0.10813999176

Could my zip function be done faster?

Community
  • 1
  • 1
Joel
  • 22,598
  • 6
  • 69
  • 93
  • 2
    `[item for sublist in l for item in sublist]` will work with one-level nested lists of any length. – thefourtheye Jan 04 '15 at 14:07
  • What do you mean by *"better"*? – jonrsharpe Jan 04 '15 at 14:08
  • @jonrsharpe Faster. the lists are long, and it's going to happen a lot. – Joel Jan 04 '15 at 14:10
  • 1
    So have you tried [profiling](https://docs.python.org/2/library/profile.html) or [timing](https://docs.python.org/2/library/timeit.html) your various options, to see which is fastest? Do you actually need the whole list, or just to iterate over it? – jonrsharpe Jan 04 '15 at 14:10
  • @thefourtheye Yes, I realize it will work. Many things will work. My question is about whether there is another option which might take advantage of the fact the sublists are all length 2. – Joel Jan 04 '15 at 14:11
  • 2
    @Joel with a vanilla list, it doesn't really matter how long the sublists are. If you will have a fixed structure, you could consider using [`numpy` arrays](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html), which may be more efficient depending on what exactly you're trying to do. – jonrsharpe Jan 04 '15 at 14:12
  • @jonrsharpe The options I'm aware of all work with variable length sublists. Of those I'm fairly confident the list comprehension is best. But I'm wondering if some intelligent use of `zip` or maybe something else might do better. I just don't have the imagination right now to see it. – Joel Jan 04 '15 at 14:14
  • @Joel `zip` is more useful for transposing than flattening. An iterative approach will be more efficient if you don't actually *need* the whole list at once, otherwise I suggest you use the list comprehension and only revisit the decision if it becomes a bottleneck. – jonrsharpe Jan 04 '15 at 14:16
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/68172/discussion-between-joel-and-jonrsharpe). – Joel Jan 04 '15 at 14:19

2 Answers2

3

A little timing suggest that the list comprehension is slightly faster than the itertools version (for short lists - Hackaholic's answer suggests the reverse is true for long lists):

>>> import timeit
>>> timeit.timeit("[item for sublist in a for item in sublist]", 
                  setup="import itertools; a = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]")
1.7200839519500732
>>> timeit.timeit("list(itertools.chain.from_iterable(a))", 
                  setup="import itertools; a = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]")
2.0097079277038574

The key advantage of the iterative method comes if you can avoid building the whole list, iterating over chain.from_iterable's output rather than passing it to the list constructor.

If you are doing operations on arrays and performance is a key consideration, consider using numpy, which, although not part of the standard library, is much faster (once you have the array):

>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
>>> a
array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10]])
>>> a.ravel()
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
>>> timeit.timeit("a.ravel()",
                  setup="import numpy as np; a = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])")
0.36390113830566406
Community
  • 1
  • 1
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • I've modified my question slightly. All sublists have length 2 and I don't care about the final order of the list. zip with a little extra stuff seems to almost tie itertools. Is there a way to tweak it a bit more? – Joel Jan 05 '15 at 01:40
  • @Joel please *provide some context*. There is no point optimising this tiny bit of your code if a different approach could remove it altogether. Again, you get the best out of `itertools` if you avoid building the whole list; whether you can do that depends on *what you want to do with it* and, as it stands, we've no idea what that is. – jonrsharpe Jan 05 '15 at 10:14
  • My question is whether the `f2` in the edit I referenced in my question could be sped up to beat `f1`. That's the essence of my question. If you must have more context: we have a list of thousands of edges in a graph which need to be randomly rewired. To do this, flatten the list, shuffle it, join adjacent pairs together. Then again we collect thousands of edges to rewire. And again. And again. – Joel Jan 05 '15 at 10:37
  • Then you should probably look into [`networkx`](https://networkx.github.io/). I don't see a way you could *"tweak"* the `zip` approach. – jonrsharpe Jan 05 '15 at 10:46
  • I've written some of the `networkx` algorithms. If I can find the "best" way to do this algorithm, it'll probably find its way into `networkx` too. Anyways, thanks for the help on this. – Joel Jan 05 '15 at 10:49
2
import itertools
a = [[1,2], [3,4], [5,6], [7,8], [9,10]]
list(itertools.chain.from_iterable(a))

output:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

now compare timing here: for small list

>>> timeit.timeit("list(itertools.chain.from_iterable(a))",setup='import itertools;a = [[1,2], [3,4], [5,6], [7,8], [9,10]]') 
0.9853601455688477
>>> timeit.timeit("[ y for x in a for y in x]",setup='a = [[1,2], [3,4], [5,6], [7,8], [9,10]]') 
0.9124641418457031

for large list:

here are the result why iterators are prefered:

>>> timeit.timeit("list(itertools.chain.from_iterable(a))",setup='import itertools;a = zip(range(100),range(100))',number=1000000) 
8.213459014892578
>>> timeit.timeit("[ y for x in a for y in x]",setup='a=zip(range(100),range(100))',number=1000000) 
12.833590984344482

from small list, list comprehension is good but for large you need to use iterators

Hackaholic
  • 19,069
  • 5
  • 54
  • 72