55

Is there a Python-way to split a string after the nth occurrence of a given delimiter?

Given a string:

'20_231_myString_234'

It should be split into (with the delimiter being '_', after its second occurrence):

['20_231', 'myString_234']

Or is the only way to accomplish this to count, split and join?

jamylak
  • 128,818
  • 30
  • 231
  • 230
cherrun
  • 2,102
  • 8
  • 34
  • 51

9 Answers9

77
>>> n = 2
>>> groups = text.split('_')
>>> '_'.join(groups[:n]), '_'.join(groups[n:])
('20_231', 'myString_234')

Seems like this is the most readable way, the alternative is regex)

jamylak
  • 128,818
  • 30
  • 231
  • 230
  • 2
    IMO this looks quite clearer than the RegEx way, although not bad either. Thanks. – cherrun Jun 12 '13 at 08:17
  • Is there a way to use this answer in a dataframe format? Where Column1 = String, output to Column2 and 3? – Arthur D. Howland Feb 05 '19 at 15:37
  • I do suggest reading my answer. I think it is more readable, and it's not a regex. – Yuval Jun 27 '21 at 14:54
  • Although this is still the best answer for an unknown number of occurrences, there's a better solution when we want either the first or the last occurrences. For first occurrences, we should use _partition_. For last occurrences, we should use _rpartition_. This is not only more readable, but also performatic, since we avoid unnecessary splits and joins. I hope Python adds an optional "occurrence=1" parameter for partition in the future, so it will fit all cases. – Leandro 86 Dec 22 '22 at 18:42
  • @Leandro86 as you mentioned partition only splits on the 1st occurrence, so if you mean adding an optional parameter to `str.split` then that already exists as `maxsplit` https://docs.python.org/3/library/stdtypes.html#str.rsplit – jamylak Dec 25 '22 at 11:36
9

Using re to get a regex of the form ^((?:[^_]*_){n-1}[^_]*)_(.*) where n is a variable:

n=2
s='20_231_myString_234'
m=re.match(r'^((?:[^_]*_){%d}[^_]*)_(.*)' % (n-1), s)
if m: print m.groups()

or have a nice function:

import re
def nthofchar(s, c, n):
    regex=r'^((?:[^%c]*%c){%d}[^%c]*)%c(.*)' % (c,c,n-1,c,c)
    l = ()
    m = re.match(regex, s)
    if m: l = m.groups()
    return l

s='20_231_myString_234'
print nthofchar(s, '_', 2)

Or without regexes, using iterative find:

def nth_split(s, delim, n): 
    p, c = -1, 0
    while c < n:  
        p = s.index(delim, p + 1)
        c += 1
    return s[:p], s[p + 1:] 

s1, s2 = nth_split('20_231_myString_234', '_', 2)
print s1, ":", s2
perreal
  • 94,503
  • 21
  • 155
  • 181
6

I like this solution because it works without any actuall regex and can easiely be adapted to another "nth" or delimiter.

import re

string = "20_231_myString_234"
occur = 2  # on which occourence you want to split

indices = [x.start() for x in re.finditer("_", string)]
part1 = string[0:indices[occur-1]]
part2 = string[indices[occur-1]+1:]

print (part1, ' ', part2)
pypat
  • 1,096
  • 1
  • 9
  • 19
5

I thought I would contribute my two cents. The second parameter to split() allows you to limit the split after a certain number of strings:

def split_at(s, delim, n):
    r = s.split(delim, n)[n]
    return s[:-len(r)-len(delim)], r

On my machine, the two good answers by @perreal, iterative find and regular expressions, actually measure 1.4 and 1.6 times slower (respectively) than this method.

It's worth noting that it can become even quicker if you don't need the initial bit. Then the code becomes:

def remove_head_parts(s, delim, n):
    return s.split(delim, n)[n]

Not so sure about the naming, I admit, but it does the job. Somewhat surprisingly, it is 2 times faster than iterative find and 3 times faster than regular expressions.

I put up my testing script online. You are welcome to review and comment.

Yuval
  • 3,207
  • 32
  • 45
1
>>>import re
>>>str= '20_231_myString_234'

>>> occerence = [m.start() for m in re.finditer('_',str)]  # this will give you a list of '_' position
>>>occerence
[2, 6, 15]
>>>result = [str[:occerence[1]],str[occerence[1]+1:]] # [str[:6],str[7:]]
>>>result
['20_231', 'myString_234']
Kousik
  • 21,485
  • 7
  • 36
  • 59
1

As @Yuval has noted in his answer, and @jamylak commented in his answer, the split and rsplit methods accept a second (optional) parameter maxsplit to avoid making splits beyond what is necessary. Thus, I find the better solution (both for readability and performance) is this:

s = '20_231_myString_234'
first_part = text.rsplit('_', 2)[0] # Gives '20_231'
second_part = text.split('_', 2)[2] # Gives 'myString_234'

This is not only simple, but also avoids performance hits of regex solutions and other solutions using join to undo unnecessary splits.

Leandro 86
  • 157
  • 1
  • 9
0

It depends what is your pattern for this split. Because if first two elements are always numbers for example, you may build regular expression and use re module. It is able to split your string as well.

Michał Fita
  • 1,183
  • 1
  • 7
  • 24
0

I had a larger string to split ever nth character, ended up with the following code:

# Split every 6 spaces
n = 6
sep = ' '
n_split_groups = []

groups = err_str.split(sep)
while len(groups):
    n_split_groups.append(sep.join(groups[:n]))
    groups = groups[n:]

print n_split_groups

Thanks @perreal!

AllBlackt
  • 710
  • 6
  • 9
0

In function form of @AllBlackt's solution

def split_nth(s, sep, n):
    n_split_groups = []
    groups = s.split(sep)
    while len(groups):
          n_split_groups.append(sep.join(groups[:n]))
          groups = groups[n:]
    return n_split_groups

s = "aaaaa bbbbb ccccc ddddd eeeeeee ffffffff"
print (split_nth(s, " ", 2))

['aaaaa bbbbb', 'ccccc ddddd', 'eeeeeee ffffffff']
BBSysDyn
  • 4,389
  • 8
  • 48
  • 63