Split List By Value and Keep Separators

Question

I have a list called list_of_strings that looks like this:

['a', 'b', 'c', 'a', 'd', 'c', 'e']

I want to split this list by a value (in this case c). I also want to keep c in the resulting split.

So the expected result is:

[['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]]

Any easy way to do this?

@ScientiaEtVeritas Thanks for pinging me back. You're right, I just saw the main difference. I'll remove it. — idjaw, Jul 19 '17 at 12:05
You might want to look at this solution https://stackoverflow.com/questions/4322705/split-a-list-into-nested-lists-on-a-value — mikea, Jul 19 '17 at 12:08

pylang · Accepted Answer · 2022-12-08T14:34:15.680

6

You can use more_itertoools⁺ to accomplish this simply and clearly:

from more_itertools import split_after


lst = ["a", "b", "c", "a", "d", "c", "e"]
list(split_after(lst, lambda x: x == "c"))
# [['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

Another example, here we split words by simply changing the predicate:

lst = ["ant", "bat", "cat", "asp", "dog", "carp", "eel"]
list(split_after(lst, lambda x: x.startswith("c")))
# [['ant', 'bat', 'cat'], ['asp', 'dog', 'carp'], ['eel']]

_{⁺ A third-party library that implements itertools recipes and more. > pip install more_itertools}

edited Dec 08 '22 at 14:34

answered Jul 19 '17 at 12:04

pylang

40,867
14
129
121

This does only work for the special case of my example. It does not split on the value ``c``, but chunks the list equally. – ScientiaEtVeritas Jul 19 '17 at 12:08
@ScientiaEtVeritas I see your requirements now. This answer has been fixed using the same library. – pylang Jul 19 '17 at 12:14
You shouldn't use `is` to compare strings. – vaultah Jul 19 '17 at 12:15
Thanks for the feedback. – pylang Jul 19 '17 at 12:18
`pip install more-itertools` – smoquet Dec 08 '22 at 13:44

j4nw · Answer 2 · 2017-07-20T11:09:25.153

6

stuff = ['a', 'b', 'c', 'a', 'd', 'c', 'e']

You can find out the indices with 'c' like this, and add 1 because you'll be splitting after it, not at its index:

indices = [i + 1 for i, x in enumerate(stuff) if x == 'c']

Then extract slices like this:

split_stuff = [stuff[i:j] for i, j in zip([0] + indices, indices + [None])]

The zip gives you a list of tuples analogous to (indices[i], indices[i + 1]), with the concatenated [0] allowing you to extract the first part and [None] extracting the last slice (stuff[i:])

edited Jul 20 '17 at 11:09

answered Jul 19 '17 at 12:19

j4nw

2,227
11
26

Please provide some context and explanation for your answer. Code alone doesn't make for a great answer. – Ben Visness Jul 19 '17 at 16:43
I understand - I added explanation. – j4nw Jul 20 '17 at 05:45

score 3 · Answer 3 · answered Jul 19 '17 at 12:09

3

You could try something like the following:

list_of_strings = ['a', 'b', 'c', 'a', 'd', 'c', 'e']

output = [[]]

for x in list_of_strings:
    output[-1].append(x)
    if x == 'c':
        output.append([])

Though it should be noted that this will append an empty list to your output if your input's last element is 'c'

answered Jul 19 '17 at 12:09

asongtoruin

9,794
3
36
47

Just move the appending of the empty list at the top of the `for` body, using a flag. I have just posted a [similar answer](https://stackoverflow.com/a/52591659/2749397) to a similar question. – gboffi Oct 01 '18 at 13:24

Artem Kryvonis · Answer 4 · 2017-07-19T12:15:14.330

1

def spliter(value, array):
    res = []
    while value in array:
        index = array.index(value)
        res.append(array[:index + 1])
        array = array[index + 1:]
    if array:
        # Append last elements
        res.append(array)
    return res

a = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
print(spliter('b',a))
# [['a', 'b'], ['c', 'a', 'd', 'c', 'e']]
print(spliter('c',a))
# [['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

edited Jul 19 '17 at 12:15

answered Jul 19 '17 at 12:09

Artem Kryvonis

128
2
13

now it looks better. You must have edited something. Dont remember any more. I did not dv anyway – Ma0 Jul 19 '17 at 12:19
Thanks :) After trying out this solution seems kind of slow, but it works as expected. – ScientiaEtVeritas Jul 19 '17 at 12:38
It will be slow. The `value in array` bit essentially makes it quadratic time. Instead you could just call array on progressively smaller slices and you could make it linear. – Paul Rooney Jul 20 '17 at 09:24
1

@PaulRooney yes, you can improve this part of code using `try except else` like someone describe here https://stackoverflow.com/questions/7571635/fastest-way-to-check-if-a-value-exist-in-a-list But is it so necessary? – Artem Kryvonis Jul 20 '17 at 09:39
On my dataset this code takes 5 minutes versus the approved answer which takes 2 seconds. – ScientiaEtVeritas Jul 20 '17 at 09:43
@ScientiaEtVeritas you are right, I wrote this snippet, not for big data sets :) And I agree with you that approved answer is a better solution. – Artem Kryvonis Jul 20 '17 at 09:45

score 1 · Answer 5 · answered Jul 20 '17 at 14:30

What about this. It should only iterate over the input once and some of that is in the index method, which is executed as native code.

def splitkeep(v, c):

    curr = 0
    try:
        nex = v.index(c)
        while True:
            yield v[curr: (nex + 1)]
            curr = nex + 1
            nex += v[curr:].index(c) + 1

    except ValueError:
        if v[curr:]: yield v[curr:]

print(list(splitkeep( ['a', 'b', 'c', 'a', 'd', 'c', 'e'], 'c')))

result

[['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

I wasn't sure if you wanted to keep an empty list at the end of the result if the final value was the value you were splitting on. I made an assumption you wouldn't, so I put a condition in excluding the final value if it's empty.

This has the result that the input [] results in only [] when arguably it might result in [[]].

Ma0 · Answer 6 · 2017-07-19T12:13:47.390

0

How about this rather playful script:

a = ['a', 'b', 'c', 'a', 'd', 'c', 'e']

b = ''.join(a).split('c')  # ['ab', 'ad', 'e']

c = [x + 'c' if i < len(b)-1 else x for i, x in enumerate(b)]  # ['abc', 'adc', 'e']

d = [list(x) for x in c if x]
print(d)  # [['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

It can also handle beginnings and endings with a "c"

a = ['c', 'a', 'b', 'c', 'a', 'd', 'c', 'e', 'c']
d -> [['c'], ['a', 'b', 'c'], ['a', 'd', 'c'], ['e', 'c']]

edited Jul 19 '17 at 12:13

answered Jul 19 '17 at 12:07

Ma0

15,057
4
35
65

Thanks :) The problem with this solution is, because of joining the strings, this only works on chars and not generally on strings. – ScientiaEtVeritas Jul 19 '17 at 12:32
can you provide an example that would cause it to fail? – Ma0 Jul 19 '17 at 12:33
1

``['a', 'cb', 'c', 'ca', 'aad', 'c', 'ccc', 'e']``, this is an example for what I mean. – ScientiaEtVeritas Jul 19 '17 at 12:34

score 0 · Answer 7 · answered Jul 19 '17 at 12:16

list_of_strings = ['a', 'b', 'c', 'a', 'd', 'c', 'e']

value = 'c'
new_list = []
temp_list = []
for item in list_of_strings:
    if item is value:
        temp_list.append(item)
        new_list.append(temp_list[:])
        temp_list.clear()
    else:
        temp_list.append(item)

if (temp_list):
    new_list.append(temp_list)

print(new_list)

score 0 · Answer 8 · answered Jul 19 '17 at 14:23

0

You can try using below snippet. Use more_itertools

>>> l = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
>>> from more_itertools import sliced
>>> list(sliced(l,l.index('c')+1))

Output is:

[['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

answered Jul 19 '17 at 14:23

Ajay2588

527
3
6

Split List By Value and Keep Separators

8 Answers8

Linked

Related