What is a Pythonic and efficient way to divide a list at a matching element?

Question

This is very similar to Python: split a list based on a condition? and also https://nedbatchelder.com/blog/201306/filter_a_list_into_two_parts.html but instead of partitioning the individual elements into two lists based on a predicate, I want to divide the list into two parts at the first element that fails the predicate.

>>> divide_list(lambda x: x < 7, list(range(10)))
([0, 1, 2, 3, 4, 5, 6], [7, 8, 9])

>>> divide_list(lambda x: x < 7, [1, 3, 5, 7, 9, 5])
([1, 3, 5], [7, 9, 5])

>>> divide_list(lambda x: x < 7, [7, 9, 5])
([], [7, 9, 5])

>>> divide_list(lambda x: x < 7, [1, 3, 5])
([1, 3, 5], [])

>>> divide_list(lambda x: x['a'], [{'a': True, 'b': 1}, {'a': True}, {'a': False}])
([{'a': True, 'b': 1}, {'a': True}], [{'a': False}])

Things to note:

the input list may not be sorted
the input list may contain duplicate elements
ideally we don't want to evaluate the condition multiple times (for each element, if the value is duplicated then that's ok)
ideally it would accept an iterator as input (i.e. can only do a single pass over the input data)
returning iterators is acceptable

After 5 years ... there is another answer there ... – Claudio Sep 17 '22 at 20:58 — Claudio, Sep 17 '22 at 20:58

score 2 · Answer 1 · edited May 23 '17 at 12:10

I think that the naive implementation is probably best unless you actually need iterators as outputs. This could be useful if your input stream is an iterator and you don't have enough memory to materialize the whole thing at once, etc.

In that case, I think that itertools is great. My initial gut instinct was to do something like:

# broken  :-(
def divide_iter(pred, lst):
    i = iter(lst)
    yield itertools.takewhile(lst, pred)
    yield i

Unfortunately this doesn't work for a variety of reasons. Most notably, it drops an element. Even if it didn't, you could run into problems if you didn't consume the entire takewhile iterable before moving on to the next list. I think that this second problem is going to be an issue that we run into when working with iterators in general, so that's kind of a bummer, but it's the price we pay for processing things element-by-element rather than materializing entire lists at once.

Instead, let's think about grouping the items based on whether the predicate has returned true yet. Then groupby becomes a lot more appealing -- the only thing is that we need to keep track of whether the predicate has returned True. Stateful functions are not much fun so instead, we can use a class and pass a bound method as the key argument to groupby:

import itertools

class _FoundTracker(object):
    def __init__(self, predicate):
        self.predicate = predicate
        self._found = False

    def check_found(self, value):
        if self._found:
            return True
        else:
           self._found = self.predicate(value)
           return self._found

def split_iterable(iterable, predicate):
    tracker = _FoundTracker(predicate)
    for i, (k, group) in enumerate(itertools.groupby(iterable, key=tracker.check_found)):
        yield group
    if i == 0:
        yield iter(())

if __name__ == '__main__':
    for group in split_iterable(xrange(10), lambda x: x < 5):
        print(list(group))

This also has some possibly funky behavior... To demonstrate, consider:

g1, g2 = split_iterable(xrange(10), lambda x: x > 5)
print(list(g1))
print(list(g2))

You'll see that you get some really weird behavior :-). Alternatively:

g1, g2 = map(list, split_iterable(range(10), lambda x: x > 5))
print(g1)
print(g2)

should work fine.

@downvoter -- If you leave a comment about why you think this answer isn't useful, or what's wrong with it, I'm happy to try to fix any errors or clear up any misunderstandings. As it is, I'm not really sure what you think is wrong with it so I'm not really sure where to focus my improvement efforts. — mgilson, May 11 '17 at 00:15

score 1 · Answer 2 · answered May 10 '17 at 23:44

A naive implementation to get things rolling:

def divide_list(pred, lst):
    before, after = [], []
    found = False
    for item in lst:
        if not found:
            if pred(item):
                before.append(item)
            else:
                found = True
        if found:
            after.append(item)
    return before, after

Jacob G. · Answer 3 · 2017-05-11T01:10:19.107

0

Here's my relatively efficient attempt:

from collections import Hashable

def divide_list(pred, list):
    # The predicate may be expensive, so we can
    # store elements that have already been checked
    # in a set for fast verification.
    elements_checked = set()

    # Assuming that every element of the list is of
    # the same type and the list is nonempty, we can
    # store a flag to check if an element is hashable.
    hashable = isinstance(list[0], Hashable)

    for index, element in enumerate(list):
        if hashable and element in elements_checked:
            continue

        if not pred(element):
            return list[:index], list[index:]

        if hashable:
            elements_checked.add(element)

    return list, []

If you were to benchmark this against the other answers, I reckon this will be the fastest.

I love this question by the way!

edited May 11 '17 at 01:10

answered May 10 '17 at 23:51

Jacob G.

28,856
5
62
116

I think this is overly complicating it, I'm not too worried about checking the pred twice if an element appears twice in the list, it's more that i didn't want to check the pred twice for each element like lots of solutions of the filter/filterfalse variety do. And the set() means this won't work if the list elements aren't hashable, will it? – Tom May 11 '17 at 00:39
If the element isn't hashable, then it will check the predicate for duplicate elements instead of caching it, which is basically equivalent to removing the set. Ideally you want to keep the set, as it will make it efficient if there exist many duplicates with an expensive predicate. – Jacob G. May 11 '17 at 00:42
If you'd like, I can add a flag that checks for hashability and optimize it off of that to either use or not use the set. – Jacob G. May 11 '17 at 00:47
I mean that divide_list(lambda x: x['a'], [{'a': True, 'b': 1}, {'a': True}, {'a': False}]) raises a TypeError: unhashable type: 'dict' (added that to the example cases to clarify for others too) – Tom May 11 '17 at 01:00

score 0 · Answer 4 · answered May 11 '17 at 00:24

This is basically your naive attempt, but doesn't use a separate Boolean flag to determine when the predicate fails; it just uses a reference to first one list, then the other, to do the appending.

def divide_list(pred, lst):
     a, b = [], []
     curr = a
     for x in lst:
         if curr is a and not pred(x):
             curr = b
         curr.append(x)
     return a, b

score 0 · Answer 5 · answered Sep 17 '22 at 20:55

Why complicated if simple possible? Already mentioned but for in my eyes not understandable reasons dropped from consideration: usage of itertools takewhile.

The code below passes all assertion tests and the function itself needs three lines of code:

from itertools import takewhile
def divide_list(pred, lstL):
    header  = list(takewhile(pred, lstL))
    trailer = lstL[len(header):]
    return header, trailer


assert divide_list(lambda x: x < 7, list(range(10))) == ([0, 1, 2, 3, 4, 5, 6], [7, 8, 9])
assert divide_list(lambda x: x < 7, [1, 3, 5, 7, 9, 5]) == ([1, 3, 5], [7, 9, 5])
assert divide_list(lambda x: x < 7, [7, 9, 5]) == ([], [7, 9, 5])
assert divide_list(lambda x: x < 7, [1, 3, 5]) == ([1, 3, 5], [])
assert divide_list(lambda x: x['a'], [{'a': True, 'b': 1}, {'a': True}, {'a': False}]) == ([{'a': True, 'b': 1}, {'a': True}], [{'a': False}])

What is a Pythonic and efficient way to divide a list at a matching element?

5 Answers5

Linked