Is it possible to split a string on multiple delimiters in order?

Question

I know how to split a string based on multiple separators using re as in this question: Split Strings with Multiple Delimiters?. But I'm wondering how to split a string using the order given in a list of delimiters where each split would only happen once.

multiple_sep_split("hello^goo^dbye:cat@dog", ['^',':','@'])
>>> ['hello', 'goo^dbye', 'cat', 'dog']  #(note the extra carat)
multiple_sep_split("my_cat:my_dog:my:bird_my_python",[':',':','_'])
>>> ['my_cat','my_dog','my:bird','my_python']

One approach could be to match not on the delimiters but on the text in between the delimiters and return those as groups but is there another way?

text_re = re.compile('(.+)^(.+):(.+)@(.+)') # get each group from here

Are you asking for a better method than the regex you gave? Is there something wrong with it? — Gabe, Dec 05 '13 at 03:31
It's unclear what you're asking for. Putting "in order" in italics doesn't actually *explain* what it means to you ;-) The example at the start *might* be helpful if you specified the set (list?) of separators you had in mind - as-is, we can only guess. — Tim Peters, Dec 05 '13 at 03:40
@TimPeters Thanks for your efforts in trying to understand the question! I've updated it so that it hopefully makes more sense :). — ishikun, Dec 05 '13 at 05:04

score 2 · Accepted Answer · answered Dec 05 '13 at 03:47

If I understand what you're asking, you just want a series of string partition operations: first partition on the first separator, then the second, etc to the end.

Here's a recursive method (which doesn't use re):

def splits(s,seps):
    l,_,r = s.partition(seps[0])
    if len(seps) == 1:
        return [l,r]
    return [l] + splits(r,seps[1:])

demo:

a = 'hello^goo^dbye:cat@dog'

splits(a,['^',':','@'])
Out[7]: ['hello', 'goo^dbye', 'cat', 'dog']

score 2 · Answer 2 · answered Dec 05 '13 at 03:49

2

I believe your question is severely under-specified, but at least this gives the result you want in the example you gave:

def split_at_most_once_each_and_in_order(s, seps):
    result = []
    start = 0
    for sep in seps:
        i = s.find(sep, start)
        if i >= 0:
            result.append(s[start: i])
            start = i+1
    if start < len(s):
        result.append(s[start:])
    return result

print split_at_most_once_each_and_in_order(
    "hello^goo^dbye:cat@dog", "^:@")

That returns ['hello', 'goo^dbye', 'cat', 'dog']. If you absolutely want to "be clever", keep looking ;-)

answered Dec 05 '13 at 03:49

Tim Peters

67,464
13
126
132

2

it heartens me to see that we independently arrived at the same variable names. Though your function name is obviously superior :) – roippi Dec 05 '13 at 03:55
2

Obviously: `splits` can only be applied to bananas ;-) – Tim Peters Dec 05 '13 at 03:57

Is it possible to split a string on multiple delimiters in order?

2 Answers2

Linked