13

Are there concise and elegant ways of splitting a list in Python into a list of sub-lists by a delimiting element, such that ['a', 'delim', 'b'] -> [['a'], ['b']]?

Here is the example:

ldat = ['a','b','c','a','b','c','a','b']
dlim = 'c'
lspl = []   # an elegant python one-liner wanted on this line!
print(lspl) # want: [['a', 'b'], ['a', 'b'], ['a', 'b']]

Working examples that seem overly complex

I have surveyed documentation and related questions on stackoverflow - many referenced below - which did not answer my question, and am summarizing my research below: several approaches which do generate the desired output, but are verbose and intricate, and what is happening (splitting a list) is not immediately apparent -- you really have to squint.

Are there better ways? I am primarily interested in readability for beginners (e.g. teaching), canonical / 'Pythonic' approaches, and secondarily in the most efficient approaches (e.g. timeit speed). Ideally answers would address both Python 2.7 and 3.x.

with conditional .append()

Loop through the list and either append to the last output list or add a new output list. Based on an example that includes the delimiter, but altered to exclude it. I'm not sure how to make it a one-liner, or if that is even desirable.

lspl = [[]]
for i in ldat:
    if i==dlim:
        lspl.append([])
    else:
        lspl[-1].append(i)
print(lspl) # prints: [['a', 'b'], ['a', 'b'], ['a', 'b']]

with itertools.groupby

Combine itertools.groupby with list comprehension. Many answers include delimeters, this is based on those that exclude delimeters.

import itertools
lspl = [list(y) for x, y in itertools.groupby(ldat, lambda z: z == dlim) if not x]
print(lspl) # prints: [['a', 'b'], ['a', 'b'], ['a', 'b']]

with slicing on indices

Some related questions have discussed how to use slicing after using .index() -- however answers usually focus on finding the first index only. One can extend this approach by first finding a list of indices and then looping through a self-zipped list to slice the ranges.

indices = [i for i, x in enumerate(ldat) if x == dlim]
lspl = [ldat[s+1:e] for s, e in zip([-1] + indices, indices + [len(ldat)])]
print(lspl) # prints: [['a', 'b'], ['a', 'b'], ['a', 'b']]

However, like all the approaches I have found, this seems like a very complex way of enacting a simple split-on-delimiter operation.

Comparison to string splitting

By comparison and as a model only, here is a working, concise, and elegant way of splitting a string into a list of sub-strings by a delimiter.

sdat = 'abcabcab'
dlim = 'c'
sspl = sdat.split(dlim)
print(sspl) # prints: ['ab', 'ab', 'ab']

NOTE: I understand there is no split method on lists in Python, and I am not asking about splitting a string. I am also not asking about splitting element-strings into new elements.

JeremyDouglass
  • 1,361
  • 2
  • 18
  • 31
  • The slicing on indices method is what comes to mind, although that is two lines. Wrap it in a function :) then it's one line – Cory Kramer Dec 05 '17 at 20:34
  • 4
    "I'm not sure how to make it a one-liner, or if that is even desirable." No, it isn't in and of itself. You can't get more canonical than a for-loop, really. In fact, the biggest problem I see with how you wrote your first example is by putting the `if` and `else` bodies on one line - use indentation. – juanpa.arrivillaga Dec 05 '17 at 20:35
  • 1
    Expected behavior if delimiter is at start or end? – timgeb Dec 05 '17 at 20:35
  • Really any behavior is fine. So `['delim', 'a', 'b', 'delim']` could become `[['a'], ['b']]`, or `[[], ['a'], ['b'], []]`, or even `[['a'], ['b'], []]` or `[[], ['a'], ['b']]`. – JeremyDouglass Dec 05 '17 at 20:38
  • @juanpa.arrivillaga I have indented the if/else example for legibility -- good point. – JeremyDouglass Dec 05 '17 at 20:40
  • @JeremyDouglass right, that solution is *perfectly* Pythonic. It is very readable, the logic is straight-forward and expressed in typical python idioms, e.g. "append to the last sublist" => `lspl[-1].append(i)`. It is also *performant*. – juanpa.arrivillaga Dec 05 '17 at 20:43
  • @juanpa.arrivillaga Thank you, I appreciate your point on its virtues. I am still hoping for even clearer alternatives. I am working on code for use by first-time programmers, and there is nothing *intuitive* about a conditional negative append = splitting. – JeremyDouglass Dec 05 '17 at 20:51

1 Answers1

-4

or this:

ldat = ['a','b','c','a','b','c','a','b']
dlim = 'c'
lspl = []   # an elegant python one-liner wanted on this line!
print(lspl) # want: [['a', 'b'], ['a', 'b'], ['a', 'b']]

s = str(ldat).replace(", '%s', " % dlim, "],[")
result = eval(s)
print(result)
user508402
  • 496
  • 1
  • 4
  • 19