10

Given a list like:

[a, SEP, b, c, SEP, SEP, d]

how do I split it into a list of sublists:

[[a], [b, c], [], [d]]

Effectively I need an equivalent of str.split() for lists. I can hack together something, but I can't seem to be able to come up with anything neat and/or pythonic.

I get the input from an iterator, so a generator working on that is acceptable as well.

More examples:

[a, SEP, SEP, SEP] -> [[a], [], [], []]

[a, b, c] -> [[a, b, c]]

[SEP] -> [[], []]
Jani
  • 853
  • 8
  • 20
  • `itertools.groupby` ? – Jean-François Fabre Jan 25 '19 at 20:19
  • 1
    do you actually want that empty list or no – d_kennetz Jan 25 '19 at 20:24
  • Huh, wonder how I failed to find the dupe question. But yeah, I want the empty lists too. – Jani Jan 25 '19 at 20:29
  • then you should have an empty list at each `sep`? Or only when `sep` occurs twice in a row? what if `sep` occurs 6 times in a row? could you clarify? – d_kennetz Jan 25 '19 at 20:31
  • 1
    I want it to work exactly like `str.split()`, but for lists. Which means sep gets removed, consecutive seps lead to consecutive empty lists in between. (And it doesn't seem trivial to me to get `itertools.groupby` to do this, IMHO, making this distinct enough not to be a dupe.) – Jani Jan 25 '19 at 20:36
  • @Jean-FrançoisFabre based on the updated examples, I agree with OP that this is not a dupe. I don't know if there's a simple `itertools.groupby` solution here. – pault Jan 25 '19 at 22:06

9 Answers9

15

A simple generator will work for all of the cases in your question:

def split(sequence, sep):
    chunk = []
    for val in sequence:
        if val == sep:
            yield chunk
            chunk = []
        else:
            chunk.append(val)
    yield chunk
wim
  • 338,267
  • 99
  • 616
  • 750
  • 2
    This is brilliant. – pault Jan 25 '19 at 22:31
  • 2
    Beautiful, simple, easy to understand, pythonic, something I *knew* was possible, but I was unable to come up with myself. Seems to work as I expect. Thanks! – Jani Jan 26 '19 at 12:37
2

My first ever Python program :)

from pprint import pprint
my_array = ["a", "SEP", "SEP", "SEP"]
my_temp = []
my_final = []
for item in my_array:
  if item != "SEP":
    my_temp.append(item)
  else:
    my_final.append(my_temp);
    my_temp = []
pprint(my_final);
Matthew Page
  • 746
  • 5
  • 15
  • This seems to have the same elements as in the answer I accepted. It's the genericity and use of generator expression that tipped the scales to the other one. Thanks. – Jani Jan 26 '19 at 12:42
  • Good call , I gave that one an up vote as well, neat code – Matthew Page Jan 26 '19 at 12:44
0

I am not sure if there's an easy itertools.groupby solution, but here is an iterative approach that should work:

def mySplit(iterable, sep):
    output = []
    sepcount = 0
    current_output = []
    for i, elem in enumerate(iterable):
        if elem != sep:
            sepcount = 0
            current_output.append(elem)
            if (i==(len(iterable)-1)):
                output.append(current_output)
        else:
            if current_output: 
                output.append(current_output)
                current_output = []

            sepcount+=1

            if (i==0) or (sepcount > 1):
                output.append([])
            if (i==(len(iterable)-1)):
                output.append([])

    return output

Testing on your examples:

testLists = [
    ['a', 'SEP', 'b', 'c', 'SEP', 'SEP', 'd'],
    ["a", "SEP", "SEP", "SEP"],
    ["SEP"],
    ["a", "b", "c"]
]

for tl in testLists:
    print(mySplit(tl, sep="SEP"))
#[['a'], ['b', 'c'], [], ['d']]
#[['a'], [], [], []]
#[[], []]
#[['a', 'b', 'c']]

This is analogous to the result you would get if examples were actually strings and you used str.split(sep):

for tl in testLists:
    print("".join(tl).split("SEP"))
#['a', 'bc', '', 'd']
#['a', '', '', '']
#['', '']
#['abc']

By the way, if the elements in your lists were always guaranteed to be strings, you could simply do:

for tl in testLists:
    print([list(x) for x in "".join(tl).split("SEP")])
#[['a'], ['b', 'c'], [], ['d']]
#[['a'], [], [], []]
#[[], []]
#[['a', 'b', 'c']]

But the mySplit() function is more general.

pault
  • 41,343
  • 15
  • 107
  • 149
  • 2
    Seems correct to me. The implementation is unnecessarily complicated, but I didn't downvote... – wim Jan 25 '19 at 23:00
0

For list or tuple objects you can use the following:

def split(seq, sep):
    start, stop = 0, -1
    while start < len(seq):
        try:
            stop = seq.index(sep, start)
        except ValueError:
            yield seq[start:]
            break
        yield seq[start:stop]
        start = stop + 1
    else:
        if stop == len(seq) - 1:
            yield []

I won't work with a generator but it's fast.

a_guest
  • 34,165
  • 12
  • 64
  • 118
  • AFAICT this does not produce the desired results. – Jani Jan 26 '19 at 12:31
  • @Jani You are right. I suppose you're referring to the case where a `SEP` is at the end of the list? It's not too difficult to account for that case, in form of a final if statement (hence no performance degradation). Please see my updated answer. – a_guest Jan 26 '19 at 18:43
  • Per quick testing, the updated answer does seem to produce the result I want. However, I still think @wim's answer is the more elegant one. Thanks. – Jani Jan 26 '19 at 18:48
  • @Jani Sure! You should select whichever solution suits you best. However I'd like to point out that, if you already start with a `list`, this approach can give you a significant speedup. Tested on my machine I got ~ 4x speedup compared to the accepted answer for both small and large as well as sparse and dense lists. – a_guest Jan 26 '19 at 22:00
0

You can use itertools.takewhile:

def split(seq, sep):
    seq, peek = iter(seq), sep
    while True:
        try:
            peek = next(seq)
        except StopIteration:
            break
        yield list(it.takewhile(sep.__ne__, it.chain((peek,), seq)))
    if peek == sep:
        yield []

The it.chain part is to find out when the seq is exhausted. Note that with this approach it's easy to yield generators instead of lists if desired.

a_guest
  • 34,165
  • 12
  • 64
  • 118
  • AFAICT this does not produce the desired results. – Jani Jan 26 '19 at 12:30
  • @Jani You are right. I suppose you're referring to the case where a `SEP` is at the end of the list? It's not too difficult to account for that case, in form of a final if statement (hence no performance degradation). Please see my updated answer. – a_guest Jan 26 '19 at 18:43
  • the best approach IMO, I simplified it: https://stackoverflow.com/a/64804147/1161025 (although returns on first empty subsequence) – maciek Nov 12 '20 at 12:45
0

If you prefer a list comprehension, then you can resort to filtering indices and slicing, using itertools.pairwise:

seq = ['a', 'SEP', 'b', 'c', 'SEP', 'SEP', 'd']
[seq[a + 1 : b]
 for (a, b) in itertools.pairwise(
     [-1] + [i for i in range(len(seq)) if seq[i] == 'SEP'] + [len(seq)])]

[['a'], ['b', 'c'], [], ['d']]
Erik Carstensen
  • 634
  • 4
  • 14
-1

I would define the following function to solve that problem.

l = ['a', 'SEP', 'b', 'c', 'SEP', 'SEP', 'd']

def sublist_with_words(word, search_list):
    res = []
    for i in range(search_list.count(word)):
        index = search_list.index(word)
        res.append(search_list[:index])
        search_list = search_list[index+1:]
    res.append(search_list)
    return res

When I try the cases you gave:

print(sublist_with_words(word = 'SEP', search_list=l))
print(sublist_with_words(word = 'SEP', search_list=['a', 'b', 'c']))
print(sublist_with_words(word = 'SEP', search_list=['SEP']))

The output is:

[['a'], ['b', 'c'], [], ['d']]
[['a', 'b', 'c']]
[[], []]
Samuel Nde
  • 2,565
  • 2
  • 23
  • 23
-1

itertools.takewhile @a_guest's approach simplified:

def split(seq, sep):
    from itertools import takewhile
    iterator = iter(seq)
    while subseq := list(takewhile(lambda x: x != sep, iterator)):
        yield subseq

Please note it returns on first empty subsequence.

maciek
  • 3,198
  • 2
  • 26
  • 33
-3

The following is a non-generic solution that (most probably) only works on list of ints:

import re

def split_list(nums, n):
    nums_str = str(nums)
    splits = nums_str.split(f"{n},")

    patc = re.compile(r"\d+")
    group = []
    for part in splits:
        group.append([int(v) for v in patc.findall(part)])

    return group

if __name__ == "__main__":
    l = [1, 2, 3, 4, 3, 6, 7, 3, 8, 9, 10]
    n = 3
    split_l = split_list(l, n)
    assert split_l == [[1, 2], [4], [6, 7], [8, 9, 10]]
yang5
  • 1,125
  • 11
  • 16