5

By now I didn't find a convenient way to split a list by certain conditions, for example, I have a record list:

 a = ((0,1),(1,0),(0,2),(1,0),(3,0),(4,0),(0,3),(1,5)....)

I want to split the content into 2 lists

alist = []
blist = []
for x in a:
    if x[0] == 0:
        alist.append(x)
    elif x[0] == 1:
        blist.append(x)

Not very concise.

Written as list comprehensions:

aList = [x for x in a if x[0] == 0]
bList = [x for x in a if x[0] == 1]

List comprehensions are usually good for reading and performance, but in this case the list must be iterated twice.

Is there a better way to do this job?

martineau
  • 119,623
  • 25
  • 170
  • 301
Max
  • 7,957
  • 10
  • 33
  • 39
  • I think you should be comparing two ints with is. if 1 is 1 – Jake Dec 31 '12 at 10:49
  • @Jake I wouldn't recommend that. Interning small ints is an implementation detail, I think. `==` is more reliable. – Lev Levitsky Dec 31 '12 at 10:50
  • @LevLevitsky Oh. well, I learn something new every day. Should I delete the above comment? – Jake Dec 31 '12 at 10:51
  • In that case, I will leave it in order for the conversation to be complete. And I will now use SO to discover when to use is vs ==. – Jake Dec 31 '12 at 10:53
  • 1
    @Jake: `is` compares if two variables point to the same object. You almost always want to use `==`. – Blender Dec 31 '12 at 10:55
  • @Blender previously I believed that it compared type and value. Similar to PHP ===. Thanks for helping clear that up. – Jake Dec 31 '12 at 11:30

6 Answers6

6

Well, the conditions are different, no wonder you need two loops. But if you want to sacrifice some readability,

aList, bList = [[x for x in a if x[0] == i] for i in (0, 1)]
Inbar Rose
  • 41,843
  • 24
  • 85
  • 131
Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
  • but in this way,you actually iter it twice ,right? first for 0,and second for 1. – Max Dec 31 '12 at 10:58
  • @Max Yes. If you want to avoid this, you have to be sure the list is already in the correct order. If this is the case, you can make use of [`itertools.groupby`](http://docs.python.org/2/library/itertools.html#itertools.groupby). Edit: no, turns out it's not necessary, see Ignacio's answer. – Lev Levitsky Dec 31 '12 at 11:00
  • 3
    @InbarRose Oh come on :) It's still two loops where one would be enough. Better compliment the other answers :) – Lev Levitsky Dec 31 '12 at 11:12
6

Adding one line will make the loop more concise at the cost of readability (and FPness).

alist = []
blist = []
bothlists = [alist, blist]
for x in a:
  bothlists[x[0]].append(x)
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
1

If you really want to make things complicated, you can import some functions from itertools to obfuscate your readable solution even more:

from operator import itemgetter
from collections import defaultdict

d = defaultdict(list)

for key, value in itertools.groupby(a, itemgetter(0)):
    d[key].append(list(value))

Here is the output:

>>> print d[0]
[[(0, 1)], [(0, 2)], [(0, 3)]]
>>> print d[1]
[[(1, 0)], [(1, 0)], [(1, 5)]]
>>> print d[4]
[[(4, 0)]]

This code just groups the items into a dictionary using the value in the first tuple as the key. It's a little more generic than your code.

Blender
  • 289,723
  • 53
  • 439
  • 496
0

I would use filter, see: http://docs.python.org/2/library/functions.html#filter

>>> a = ((0,1), (1,0), (0,2), (1,0), (3,0), (4,0), (0,3), (1,5))
>>> filter(lambda x: x[0] == 1, a)
((1, 0), (1, 0), (1, 5))

To avoid redundant code you could collect the conditions in an iterable, like so:

>>> fs = (lambda x: x[0] == 0, lambda x: x[0] == 1)
>>> for f in fs:
...     filter(f, a)
...
((0, 1), (0, 2), (0, 3))
((1, 0), (1, 0), (1, 5))
atomocopter
  • 948
  • 10
  • 20
0

A dict can be more concise.

alist, blist = [], []
for x in a:
    {0: alist, 1: blist}.get(x[0]).append(x)

This is a flexible pattern that you can expand to cover other situations.

For example, you can add more choices. You can also gracefully deal with unexpected values by changing .get(x[0]) to .get(x[0], []).

ChaimG
  • 7,024
  • 4
  • 38
  • 46
0

Here's a way to do this that takes advantage of the speed of a list comprehension to make one list, while incrementally appending the rejects onto another list.

blist = []
alist = [x for x in data if x[0]==0 or blist.append(x)]

This takes advantage of Python's greedily evaluating the or, so for any x that passes the alist criterion (x[0]==0) Python won't bother evaluating the second disjunct (blist.append(x)). This is a case where it's fairly intuitive to read Python's "or" as meaning "or else do this second thing and use its output instead" (which is technically what it always means).

This also takes advantage of the fact that list.append always returns a false-like None value, so that in the case where some x fails the alist criterion and thus gets appended to blist instead, the or still evaluates as false, so that rejected x still will not be included in the alist comprehension.

Here are timeit results (on my reasonably powerful desktop), each dividing a list of a million random numbers ranging 0-1 into a small list of those less than 0.1 and a large list of those larger, each trial repeated 10 times.

0.80 seconds: Using append to create both (suggested by @Ignacio)
0.68 seconds: Separate list comprehensions (suggested by @Max)
0.68 seconds: Using append to create large, comprehend to create small (inverse of what I would suggest!)
0.52 seconds: Using append to create small, comprehend to create large (suggested by me)

These results confirm what you would probably expect: primarily, list comprehension builds big lists a lot faster than append does, and, secondarily, 1 traversal is faster than 2. If you can predict which list will be longer, better to build that large list by comprehension, while appending just a few rejects into the smaller list.


For anyone who wants to try this themselves, here's the timing code.

import random, timeit
L = [random.random() for i in range(1000000)]

def comprehend_both():  # suggested by @max
    small = [n for n in L if n <= 0.1]
    large = [n for n in L if n > 0.1]

def append_both():   # suggested by @ignacio
    lists = small, large = [[],[]]
    for n in L: lists[n > 0.1].append(n)  # exploits False==0, True==1

def append_small_comprehend_large():  # suggested by @JustinFisher
    small = []
    large = [n for n in L if n > 0.1 or small.append(n)]

def comprehend_small_append_large():  # worse since appending more is slower
    large = []
    small = [n for n in L if n <= 0.1 or large.append(n)]

print( "C/C:", timeit.timeit("comprehend_both()",globals=globals(),number=10)) # 0.68 seconds
print( "A/A:", timeit.timeit("append_both()",globals=globals(),number=10))     # 0.80 seconds
print( "A/C:", timeit.timeit("append_small_comprehend_large()",globals=globals(),number=10)) # returns 0.52
print( "C/A:", timeit.timeit("comprehend_small_append_large()",globals=globals(),number=10)) # returns 0.68
JustinFisher
  • 607
  • 1
  • 7
  • 10