How to split a sequence according to a predicate?

Question

I very often run into the need to split a sequence into the two subsequences of elements that satisfy and don't satisfy a given predicate (preserving the original relative ordering).

This hypothetical "splitter" function would look something like this in action:

>>> data = map(str, range(14))
>>> pred = lambda i: int(i) % 3 == 2
>>> splitter(data, pred)
[('2', '5', '8', '11'), ('0', '1', '3', '4', '6', '7', '9', '10', '12', '13')]

My question is:

does Python already have a standard/built-in way to do this?

This functionality is certainly not difficult to code (see Addendum below), but for a number of reasons, I'd prefer to use a standard/built-in method than a self-rolled one.

Thanks!

Addendum:

The best standard function I've found so far for handling this task in Python is itertools.groupby. To use it for this particular task however, it is necessary to call the predicate function twice for each list member, which I find annoyingly silly:

>>> import itertools as it
>>> [tuple(v[1]) for v in it.groupby(sorted(data, key=pred), key=pred)]
[('0', '1', '3', '4', '6', '7', '9', '10', '12', '13'), ('2', '5', '8', '11')]

(The last output above differs from the desired one shown earlier in that the subsequence of elements that satisfy the predicate comes last rather than first, but this is very minor, and very easy to fix if needed.)

One can avoid the redundant calls to the predicate (by doing, basically, an "inline memoization"), but my best stab at this gets pretty elaborate, a far cry from the simplicity of splitter(data, pred):

>>> first = lambda t: t[0]
>>> [zip(*i[1])[1] for i in it.groupby(sorted(((pred(x), x) for x in data),
... key=first), key=first)]
[('0', '1', '3', '4', '6', '7', '9', '10', '12', '13'), ('2', '5', '8', '11')]

BTW, if you don't care about preserving the original ordering, sorted's default sort order gets the job done (so the key parameter may be omitted from the sorted call):

>>> [zip(*i[1])[1] for i in it.groupby(sorted(((pred(x), x) for x in data)),
... key=first)]
[('0', '1', '3', '4', '6', '7', '9', '10', '12', '13'), ('2', '5', '8', '11')]

Can you help us understand why you don't want to write a function? — Ned Batchelder, Jan 09 '12 at 19:31
possible duplicate of [Python: split a list based on a condition?](http://stackoverflow.com/questions/949098/python-split-a-list-based-on-a-condition) — user, Sep 21 '14 at 03:03

score 37 · Answer 1 · answered Jan 09 '12 at 19:21

37

I know you said you didn't want to write your own function, but I can't imagine why. Your solutions involve writing your own code, you just aren't modularizing them into functions.

This does exactly what you want, is understandable, and only evaluates the predicate once per element:

def splitter(data, pred):
    yes, no = [], []
    for d in data:
        if pred(d):
            yes.append(d)
        else:
            no.append(d)
    return [yes, no]

If you want it to be more compact (for some reason):

def splitter(data, pred):
    yes, no = [], []
    for d in data:
        (yes if pred(d) else no).append(d)
    return [yes, no]

answered Jan 09 '12 at 19:21

Ned Batchelder

364,293
75
561
662

This would be the sane way to implement this. – Jeff Mercado Jan 09 '12 at 19:29
1

I like to set a default value `pred=bool`. – wap26 Feb 11 '17 at 09:12
1

I'd return result as a tuple as it should always be exactly 2 elements, but otherwise very nice solution IMO. – EdvardM Apr 25 '18 at 11:17

Jeff Mercado · Answer 2 · 2012-01-09T19:22:23.777

23

Partitioning is one of those itertools recipes that does just that. It uses tee() to make sure it's iterating the collection in one pass despite the multiple iterators, the builtin filter() function to grab items that satisfies the predicate as well as filterfalse() to get the opposite effect of the filter. This is as close as you're going to get at a standard/builtin method.

def partition(pred, iterable):
    'Use a predicate to partition entries into false entries and true entries'
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

edited Jan 09 '12 at 19:22

answered Jan 09 '12 at 19:14

Jeff Mercado

129,526
32
251
272

16

Note: This wouldn't be the optimal solution to do this, the collection is effectively iterated over twice. This goes for a functional approach rather than an imperative one. – Jeff Mercado Jan 10 '12 at 03:19
4

Probably more importantly, it also calls the predicate twice on each element. – user2357112 Sep 14 '17 at 17:04

score 18 · Accepted Answer · edited Jul 25 '22 at 11:04

18

In more_itertools there is a function called partition, which does exactly what topicstarter asked for.

from more_itertools import partition

numbers = [1, 2, 3, 4, 5, 6, 7]
predicate = lambda x: x % 2 == 0

predicate_false, predicate_true = partition(predicate, numbers)

print(list(predicate_false), list(predicate_true))

The result is [1, 3, 5, 7] [2, 4, 6].

edited Jul 25 '22 at 11:04

Timmmm

88,195
71
364
509

answered Nov 17 '20 at 14:18

Pantone877

515
6
15

10

more_itertools is on PyPI, but it is not in the Python standard library – Aron Griffis Nov 17 '20 at 22:29
[wait...] oh, really. I thought it is, actually. Thanks for pointing out! – Pantone877 Nov 19 '20 at 07:52

score 3 · Answer 4 · answered Nov 09 '14 at 21:26

If you don't care about efficiency, I think groupby (or any "putting data into n bins" functions) has some nice correspondence,

by_bins_iter = itertools.groupby(sorted(data, key=pred), key=pred)
by_bins = dict((k, tuple(v)) for k, v in by_bins_iter)

You can then get to your solution by,

return by_bins.get(True, ()), by_bins.get(False, ())

score 0 · Answer 5 · answered May 14 '18 at 22:50

A slight variation of one of the OP's implementations and another commenter's implementation above using groupby:

groups = defaultdict(list, { k : list(ks) for k, ks in groupby(items, f) })

groups[True] == the matching items, or [] if none returned True
groups[False] == the non-matching items, or [] if none returned False

Sadly, as you point out, groupby requires that the items be sorted by the predicate first, so if that's not guaranteed, you need this:

groups = defaultdict(list, { k : list(ks) for k, ks in groupby(sorted(items, key=f), f) })

Quite a mouthful, but it is a single expression that partitions a list by a predicate using only built-in functions.

I don't think you can just use sorted without the key parameter, because groupby creates a new group when it hits a new value from the key function. So sorted will only work if the items sort naturally by the predicate provided.

score 0 · Answer 6 · answered Jun 30 '18 at 13:20

As a slightly more general solution to partitioning, consider grouping. Consider the following function, inspired by clojure's group-by function.

You give it a collection of items to group, and a function that will be used to group them. Here's the code:

def group_by(seq, f):

    groupings = {}

    for item in seq:
        res = f(item)
        if res in groupings:
            groupings[res].append(item)
        else:
            groupings[res] = [item]

    return groupings

For the OP's original case:

y = group_by(range(14), lambda i: int(i) % 3 == 2)
{False: [0, 1, 3, 4, 6, 7, 9, 10, 12, 13], True: [2, 5, 8, 11]}

A more general case of grouping elements in a sequence by string length:

x = group_by(["x","xx","yy","zzz","z","7654321"], len)
{1: ['x', 'z'], 2: ['xx', 'yy'], 3: ['zzz'], 7: ['7654321']}

This can be extended to many cases, and is a core functionality of functional languages. It works great with the dynamically typed python, as the keys in the resulting map can be any type. Enjoy!

score 0 · Answer 7 · answered Oct 12 '22 at 14:32

Reducing the iterable into the 2 partitions using functools.reduce you can get rid of the key function:

import functools

functools.reduce(
    lambda tf, x: (tf[0], [*tf[1], x]) if pred(x) else ([*tf[0], x], tf[1]),
    data,
    ([], []),
)
>>> (['0', '1', '3', '4', '6', '7', '9', '10', '12', '13'], ['2', '5', '8', '11'])

How to split a sequence according to a predicate?

7 Answers7

Linked

Related