8

I have a list called list_of_strings that looks like this:

['a', 'b', 'c', 'a', 'd', 'c', 'e']

I want to split this list by a value (in this case c). I also want to keep c in the resulting split.

So the expected result is:

[['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]]

Any easy way to do this?

ScientiaEtVeritas
  • 5,158
  • 4
  • 41
  • 59
  • @ScientiaEtVeritas Thanks for pinging me back. You're right, I just saw the main difference. I'll remove it. – idjaw Jul 19 '17 at 12:05
  • You might want to look at this solution https://stackoverflow.com/questions/4322705/split-a-list-into-nested-lists-on-a-value – mikea Jul 19 '17 at 12:08

8 Answers8

6

You can use more_itertoools+ to accomplish this simply and clearly:

from more_itertools import split_after


lst = ["a", "b", "c", "a", "d", "c", "e"]
list(split_after(lst, lambda x: x == "c"))
# [['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

Another example, here we split words by simply changing the predicate:

lst = ["ant", "bat", "cat", "asp", "dog", "carp", "eel"]
list(split_after(lst, lambda x: x.startswith("c")))
# [['ant', 'bat', 'cat'], ['asp', 'dog', 'carp'], ['eel']]

+ A third-party library that implements itertools recipes and more. > pip install more_itertools

pylang
  • 40,867
  • 14
  • 129
  • 121
6
stuff = ['a', 'b', 'c', 'a', 'd', 'c', 'e']

You can find out the indices with 'c' like this, and add 1 because you'll be splitting after it, not at its index:

indices = [i + 1 for i, x in enumerate(stuff) if x == 'c']

Then extract slices like this:

split_stuff = [stuff[i:j] for i, j in zip([0] + indices, indices + [None])]

The zip gives you a list of tuples analogous to (indices[i], indices[i + 1]), with the concatenated [0] allowing you to extract the first part and [None] extracting the last slice (stuff[i:])

j4nw
  • 2,227
  • 11
  • 26
3

You could try something like the following:

list_of_strings = ['a', 'b', 'c', 'a', 'd', 'c', 'e']

output = [[]]

for x in list_of_strings:
    output[-1].append(x)
    if x == 'c':
        output.append([])

Though it should be noted that this will append an empty list to your output if your input's last element is 'c'

asongtoruin
  • 9,794
  • 3
  • 36
  • 47
  • Just move the appending of the empty list at the top of the `for` body, using a flag. I have just posted a [similar answer](https://stackoverflow.com/a/52591659/2749397) to a similar question. – gboffi Oct 01 '18 at 13:24
1
def spliter(value, array):
    res = []
    while value in array:
        index = array.index(value)
        res.append(array[:index + 1])
        array = array[index + 1:]
    if array:
        # Append last elements
        res.append(array)
    return res

a = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
print(spliter('b',a))
# [['a', 'b'], ['c', 'a', 'd', 'c', 'e']]
print(spliter('c',a))
# [['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]
Artem Kryvonis
  • 128
  • 2
  • 13
  • now it looks better. You must have edited something. Dont remember any more. I did not dv anyway – Ma0 Jul 19 '17 at 12:19
  • Thanks :) After trying out this solution seems kind of slow, but it works as expected. – ScientiaEtVeritas Jul 19 '17 at 12:38
  • It will be slow. The `value in array` bit essentially makes it quadratic time. Instead you could just call array on progressively smaller slices and you could make it linear. – Paul Rooney Jul 20 '17 at 09:24
  • 1
    @PaulRooney yes, you can improve this part of code using `try except else` like someone describe here https://stackoverflow.com/questions/7571635/fastest-way-to-check-if-a-value-exist-in-a-list But is it so necessary? – Artem Kryvonis Jul 20 '17 at 09:39
  • On my dataset this code takes 5 minutes versus the approved answer which takes 2 seconds. – ScientiaEtVeritas Jul 20 '17 at 09:43
  • @ScientiaEtVeritas you are right, I wrote this snippet, not for big data sets :) And I agree with you that approved answer is a better solution. – Artem Kryvonis Jul 20 '17 at 09:45
1

What about this. It should only iterate over the input once and some of that is in the index method, which is executed as native code.

def splitkeep(v, c):

    curr = 0
    try:
        nex = v.index(c)
        while True:
            yield v[curr: (nex + 1)]
            curr = nex + 1
            nex += v[curr:].index(c) + 1

    except ValueError:
        if v[curr:]: yield v[curr:]

print(list(splitkeep( ['a', 'b', 'c', 'a', 'd', 'c', 'e'], 'c')))

result

[['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

I wasn't sure if you wanted to keep an empty list at the end of the result if the final value was the value you were splitting on. I made an assumption you wouldn't, so I put a condition in excluding the final value if it's empty.

This has the result that the input [] results in only [] when arguably it might result in [[]].

Paul Rooney
  • 20,879
  • 9
  • 40
  • 61
0

How about this rather playful script:

a = ['a', 'b', 'c', 'a', 'd', 'c', 'e']

b = ''.join(a).split('c')  # ['ab', 'ad', 'e']

c = [x + 'c' if i < len(b)-1 else x for i, x in enumerate(b)]  # ['abc', 'adc', 'e']

d = [list(x) for x in c if x]
print(d)  # [['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]

It can also handle beginnings and endings with a "c"

a = ['c', 'a', 'b', 'c', 'a', 'd', 'c', 'e', 'c']
d -> [['c'], ['a', 'b', 'c'], ['a', 'd', 'c'], ['e', 'c']]
Ma0
  • 15,057
  • 4
  • 35
  • 65
0
list_of_strings = ['a', 'b', 'c', 'a', 'd', 'c', 'e']

value = 'c'
new_list = []
temp_list = []
for item in list_of_strings:
    if item is value:
        temp_list.append(item)
        new_list.append(temp_list[:])
        temp_list.clear()
    else:
        temp_list.append(item)

if (temp_list):
    new_list.append(temp_list)

print(new_list)
YuryChu
  • 181
  • 1
  • 3
  • 13
0

You can try using below snippet. Use more_itertools

>>> l = ['a', 'b', 'c', 'a', 'd', 'c', 'e']
>>> from more_itertools import sliced
>>> list(sliced(l,l.index('c')+1))

Output is:

[['a', 'b', 'c'], ['a', 'd', 'c'], ['e']]
Ajay2588
  • 527
  • 3
  • 6