How to split a list on a condition?

Question

By now I didn't find a convenient way to split a list by certain conditions, for example, I have a record list:

 a = ((0,1),(1,0),(0,2),(1,0),(3,0),(4,0),(0,3),(1,5)....)

I want to split the content into 2 lists

alist = []
blist = []
for x in a:
    if x[0] == 0:
        alist.append(x)
    elif x[0] == 1:
        blist.append(x)

Not very concise.

Written as list comprehensions:

aList = [x for x in a if x[0] == 0]
bList = [x for x in a if x[0] == 1]

List comprehensions are usually good for reading and performance, but in this case the list must be iterated twice.

Is there a better way to do this job?

@Jake I wouldn't recommend that. Interning small ints is an implementation detail, I think. `==` is more reliable. — Lev Levitsky, Dec 31 '12 at 10:50
@LevLevitsky Oh. well, I learn something new every day. Should I delete the above comment? — Jake, Dec 31 '12 at 10:51
In that case, I will leave it in order for the conversation to be complete. And I will now use SO to discover when to use is vs ==. — Jake, Dec 31 '12 at 10:53
@Jake: `is` compares if two variables point to the same object. You almost always want to use `==`. — Blender, Dec 31 '12 at 10:55
@Blender previously I believed that it compared type and value. Similar to PHP ===. Thanks for helping clear that up. — Jake, Dec 31 '12 at 11:30

score 6 · Answer 1 · edited Dec 31 '12 at 11:08

6

Well, the conditions are different, no wonder you need two loops. But if you want to sacrifice some readability,

aList, bList = [[x for x in a if x[0] == i] for i in (0, 1)]

edited Dec 31 '12 at 11:08

Inbar Rose

41,843
24
85
131

answered Dec 31 '12 at 10:49

Lev Levitsky

63,701
20
147
175

but in this way,you actually iter it twice ,right? first for 0,and second for 1. – Max Dec 31 '12 at 10:58
@Max Yes. If you want to avoid this, you have to be sure the list is already in the correct order. If this is the case, you can make use of [`itertools.groupby`](http://docs.python.org/2/library/itertools.html#itertools.groupby). Edit: no, turns out it's not necessary, see Ignacio's answer. – Lev Levitsky Dec 31 '12 at 11:00
3

@InbarRose Oh come on :) It's still two loops where one would be enough. Better compliment the other answers :) – Lev Levitsky Dec 31 '12 at 11:12

score 6 · Answer 2 · answered Dec 31 '12 at 10:59

6

Adding one line will make the loop more concise at the cost of readability (and FPness).

alist = []
blist = []
bothlists = [alist, blist]
for x in a:
  bothlists[x[0]].append(x)

answered Dec 31 '12 at 10:59

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

1

good for simple condition check! use bothlists = ([],[]) alist,blist = bothlist,can save one line. – Max Dec 31 '12 at 11:05
1

@Max: Actually `bothlists = alist, blist = [], []` would save two lines. – martineau Feb 10 '15 at 20:38

Blender · Answer 3 · 2012-12-31T11:09:12.077

If you really want to make things complicated, you can import some functions from itertools to obfuscate your readable solution even more:

from operator import itemgetter
from collections import defaultdict

d = defaultdict(list)

for key, value in itertools.groupby(a, itemgetter(0)):
    d[key].append(list(value))

Here is the output:

>>> print d[0]
[[(0, 1)], [(0, 2)], [(0, 3)]]
>>> print d[1]
[[(1, 0)], [(1, 0)], [(1, 5)]]
>>> print d[4]
[[(4, 0)]]

This code just groups the items into a dictionary using the value in the first tuple as the key. It's a little more generic than your code.

score 0 · Answer 4 · answered Dec 31 '12 at 14:19

0

I would use filter, see: http://docs.python.org/2/library/functions.html#filter

>>> a = ((0,1), (1,0), (0,2), (1,0), (3,0), (4,0), (0,3), (1,5))
>>> filter(lambda x: x[0] == 1, a)
((1, 0), (1, 0), (1, 5))

To avoid redundant code you could collect the conditions in an iterable, like so:

>>> fs = (lambda x: x[0] == 0, lambda x: x[0] == 1)
>>> for f in fs:
...     filter(f, a)
...
((0, 1), (0, 2), (0, 3))
((1, 0), (1, 0), (1, 5))

answered Dec 31 '12 at 14:19

atomocopter

948
10
20

This iterates through the list multiple times, so isn't an improvement. – martineau Feb 10 '15 at 20:42

score 0 · Answer 5 · answered Jan 05 '21 at 04:34

A dict can be more concise.

alist, blist = [], []
for x in a:
    {0: alist, 1: blist}.get(x[0]).append(x)

This is a flexible pattern that you can expand to cover other situations.

For example, you can add more choices. You can also gracefully deal with unexpected values by changing .get(x[0]) to .get(x[0], []).

JustinFisher · Answer 6 · 2022-04-18T21:01:57.147

Here's a way to do this that takes advantage of the speed of a list comprehension to make one list, while incrementally appending the rejects onto another list.

blist = []
alist = [x for x in data if x[0]==0 or blist.append(x)]

This takes advantage of Python's greedily evaluating the or, so for any x that passes the alist criterion (x[0]==0) Python won't bother evaluating the second disjunct (blist.append(x)). This is a case where it's fairly intuitive to read Python's "or" as meaning "or else do this second thing and use its output instead" (which is technically what it always means).

This also takes advantage of the fact that list.append always returns a false-like None value, so that in the case where some x fails the alist criterion and thus gets appended to blist instead, the or still evaluates as false, so that rejected x still will not be included in the alist comprehension.

Here are timeit results (on my reasonably powerful desktop), each dividing a list of a million random numbers ranging 0-1 into a small list of those less than 0.1 and a large list of those larger, each trial repeated 10 times.

0.80 seconds: Using append to create both (suggested by @Ignacio)
0.68 seconds: Separate list comprehensions (suggested by @Max)
0.68 seconds: Using append to create large, comprehend to create small (inverse of what I would suggest!)
0.52 seconds: Using append to create small, comprehend to create large (suggested by me)

These results confirm what you would probably expect: primarily, list comprehension builds big lists a lot faster than append does, and, secondarily, 1 traversal is faster than 2. If you can predict which list will be longer, better to build that large list by comprehension, while appending just a few rejects into the smaller list.

For anyone who wants to try this themselves, here's the timing code.

import random, timeit
L = [random.random() for i in range(1000000)]

def comprehend_both():  # suggested by @max
    small = [n for n in L if n <= 0.1]
    large = [n for n in L if n > 0.1]

def append_both():   # suggested by @ignacio
    lists = small, large = [[],[]]
    for n in L: lists[n > 0.1].append(n)  # exploits False==0, True==1

def append_small_comprehend_large():  # suggested by @JustinFisher
    small = []
    large = [n for n in L if n > 0.1 or small.append(n)]

def comprehend_small_append_large():  # worse since appending more is slower
    large = []
    small = [n for n in L if n <= 0.1 or large.append(n)]

print( "C/C:", timeit.timeit("comprehend_both()",globals=globals(),number=10)) # 0.68 seconds
print( "A/A:", timeit.timeit("append_both()",globals=globals(),number=10))     # 0.80 seconds
print( "A/C:", timeit.timeit("append_small_comprehend_large()",globals=globals(),number=10)) # returns 0.52
print( "C/A:", timeit.timeit("comprehend_small_append_large()",globals=globals(),number=10)) # returns 0.68

How to split a list on a condition?

6 Answers6

Linked