30

I have a list containing various string values. I want to split the list whenever I see WORD. The result will be a list of lists (which will be the sublists of original list) containing exactly one instance of the WORD I can do this using a loop but is there a more pythonic way to do achieve this ?

Example = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']

result = [['A'], ['WORD','B','C'],['WORD','D']]

This is what I have tried but it actually does not achieve what I want since it will put WORD in a different list that it should be in:

def split_excel_cells(delimiter, cell_data):

    result = []

    temp = []

    for cell in cell_data:
        if cell == delimiter:
            temp.append(cell)
            result.append(temp)
            temp = []
        else:
            temp.append(cell)

    return result
Georgy
  • 12,464
  • 7
  • 65
  • 73
Cemre Mengü
  • 18,062
  • 27
  • 111
  • 169

4 Answers4

41
import itertools

lst = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
w = 'WORD'

spl = [list(y) for x, y in itertools.groupby(lst, lambda z: z == w) if not x]

this creates a splitted list without delimiters, which looks more logical to me:

[['A'], ['B', 'C'], ['D']]

If you insist on delimiters to be included, this should do the trick:

spl = [[]]
for x, y in itertools.groupby(lst, lambda z: z == w):
    if x: spl.append([])
    spl[-1].extend(y)
Drake Guan
  • 14,514
  • 15
  • 67
  • 94
georg
  • 211,518
  • 52
  • 313
  • 390
23

I would use a generator:

def group(seq, sep):
    g = []
    for el in seq:
        if el == sep:
            yield g
            g = []
        g.append(el)
    yield g

ex = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
result = list(group(ex, 'WORD'))
print(result)

This prints

[['A'], ['WORD', 'B', 'C'], ['WORD', 'D']]

The code accepts any iterable, and produces an iterable (which you don't have to flatten into a list if you don't want to).

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 2
    Note that if u want to exclude the delimiter from the results, u can add continue statement inside the if statement in the `group` function. – tjysdsg Aug 06 '19 at 13:35
  • Note that if you exclude the stop-word, you would be yielding a empty list if the stop-word is at the end of your input – norok2 Dec 06 '19 at 12:15
4
  • @NPE's solution looks very pythonic to me. This is another one using itertools:
  • izip is specific to python 2.7. Replace izip with zip to work in python 3
from itertools import izip, chain
example = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
indices = [i for i,x in enumerate(example) if x=="WORD"]
pairs = izip(chain([0], indices), chain(indices, [None]))
result = [example[i:j] for i, j in pairs]
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
A. Rodas
  • 20,171
  • 8
  • 62
  • 72
3

Given

import more_itertools as mit


iterable = ["A", "WORD", "B" , "C" , "WORD" , "D"]
pred = lambda x: x == "WORD"

Code

list(mit.split_before(iterable, pred))
# [['A'], ['WORD', 'B', 'C'], ['WORD', 'D']]

more_itertools is a third-party library installable via > pip install more_itertools.

See also split_at and split_after.

pylang
  • 40,867
  • 14
  • 129
  • 121