0

I know how to split a string based on multiple separators using re as in this question: Split Strings with Multiple Delimiters?. But I'm wondering how to split a string using the order given in a list of delimiters where each split would only happen once.

multiple_sep_split("hello^goo^dbye:cat@dog", ['^',':','@'])
>>> ['hello', 'goo^dbye', 'cat', 'dog']  #(note the extra carat)
multiple_sep_split("my_cat:my_dog:my:bird_my_python",[':',':','_'])
>>> ['my_cat','my_dog','my:bird','my_python']

One approach could be to match not on the delimiters but on the text in between the delimiters and return those as groups but is there another way?

text_re = re.compile('(.+)^(.+):(.+)@(.+)') # get each group from here
Community
  • 1
  • 1
ishikun
  • 428
  • 5
  • 21
  • Are you asking for a better method than the regex you gave? Is there something wrong with it? – Gabe Dec 05 '13 at 03:31
  • It's unclear what you're asking for. Putting "in order" in italics doesn't actually *explain* what it means to you ;-) The example at the start *might* be helpful if you specified the set (list?) of separators you had in mind - as-is, we can only guess. – Tim Peters Dec 05 '13 at 03:40
  • I'll update it now! Had to go to lunch...! – ishikun Dec 05 '13 at 04:33
  • @TimPeters Thanks for your efforts in trying to understand the question! I've updated it so that it hopefully makes more sense :). – ishikun Dec 05 '13 at 05:04

2 Answers2

2

If I understand what you're asking, you just want a series of string partition operations: first partition on the first separator, then the second, etc to the end.

Here's a recursive method (which doesn't use re):

def splits(s,seps):
    l,_,r = s.partition(seps[0])
    if len(seps) == 1:
        return [l,r]
    return [l] + splits(r,seps[1:])

demo:

a = 'hello^goo^dbye:cat@dog'

splits(a,['^',':','@'])
Out[7]: ['hello', 'goo^dbye', 'cat', 'dog']
roippi
  • 25,533
  • 4
  • 48
  • 73
2

I believe your question is severely under-specified, but at least this gives the result you want in the example you gave:

def split_at_most_once_each_and_in_order(s, seps):
    result = []
    start = 0
    for sep in seps:
        i = s.find(sep, start)
        if i >= 0:
            result.append(s[start: i])
            start = i+1
    if start < len(s):
        result.append(s[start:])
    return result

print split_at_most_once_each_and_in_order(
    "hello^goo^dbye:cat@dog", "^:@")

That returns ['hello', 'goo^dbye', 'cat', 'dog']. If you absolutely want to "be clever", keep looking ;-)

Tim Peters
  • 67,464
  • 13
  • 126
  • 132