Split string at nth occurrence of a given character

Question

Is there a Python-way to split a string after the nth occurrence of a given delimiter?

Given a string:

'20_231_myString_234'

It should be split into (with the delimiter being '_', after its second occurrence):

['20_231', 'myString_234']

Or is the only way to accomplish this to count, split and join?

So do you mean you want to split a string at, say, the second underscore in the string from the beginning? — Jude Osborn, Jun 12 '13 at 07:40

jamylak · Accepted Answer · 2015-11-19T23:06:18.547

77

>>> n = 2
>>> groups = text.split('_')
>>> '_'.join(groups[:n]), '_'.join(groups[n:])
('20_231', 'myString_234')

Seems like this is the most readable way, the alternative is regex)

edited Nov 19 '15 at 23:06

answered Jun 12 '13 at 07:57

jamylak

128,818
30
231
230

2

IMO this looks quite clearer than the RegEx way, although not bad either. Thanks. – cherrun Jun 12 '13 at 08:17
Is there a way to use this answer in a dataframe format? Where Column1 = String, output to Column2 and 3? – Arthur D. Howland Feb 05 '19 at 15:37
I do suggest reading my answer. I think it is more readable, and it's not a regex. – Yuval Jun 27 '21 at 14:54
Although this is still the best answer for an unknown number of occurrences, there's a better solution when we want either the first or the last occurrences. For first occurrences, we should use _partition_. For last occurrences, we should use _rpartition_. This is not only more readable, but also performatic, since we avoid unnecessary splits and joins. I hope Python adds an optional "occurrence=1" parameter for partition in the future, so it will fit all cases. – Leandro 86 Dec 22 '22 at 18:42
@Leandro86 as you mentioned partition only splits on the 1st occurrence, so if you mean adding an optional parameter to `str.split` then that already exists as `maxsplit` https://docs.python.org/3/library/stdtypes.html#str.rsplit – jamylak Dec 25 '22 at 11:36

perreal · Answer 2 · 2014-06-02T07:24:22.807

9

Using re to get a regex of the form ^((?:[^_]*_){n-1}[^_]*)_(.*) where n is a variable:

n=2
s='20_231_myString_234'
m=re.match(r'^((?:[^_]*_){%d}[^_]*)_(.*)' % (n-1), s)
if m: print m.groups()

or have a nice function:

import re
def nthofchar(s, c, n):
    regex=r'^((?:[^%c]*%c){%d}[^%c]*)%c(.*)' % (c,c,n-1,c,c)
    l = ()
    m = re.match(regex, s)
    if m: l = m.groups()
    return l

s='20_231_myString_234'
print nthofchar(s, '_', 2)

Or without regexes, using iterative find:

def nth_split(s, delim, n): 
    p, c = -1, 0
    while c < n:  
        p = s.index(delim, p + 1)
        c += 1
    return s[:p], s[p + 1:] 

s1, s2 = nth_split('20_231_myString_234', '_', 2)
print s1, ":", s2

edited Jun 02 '14 at 07:24

answered Jun 12 '13 at 07:40

perreal

94,503
21
155
181

Using your code, the output is this: `('20_231_', 'myString_234')`. The delimiter is included as well. – cherrun Jun 12 '13 at 07:56
@cherrun insert the delimiter before `(.*)` to the regexp – jamylak Jun 12 '13 at 08:01
@perreal feel free to use old string formatting in this specific case so you don't need all the `{{}}` – jamylak Jun 12 '13 at 08:02
1

@perreal this actually looks nice now – jamylak Jun 12 '13 at 08:08

score 6 · Answer 3 · answered Jun 12 '13 at 07:54

I like this solution because it works without any actuall regex and can easiely be adapted to another "nth" or delimiter.

import re

string = "20_231_myString_234"
occur = 2  # on which occourence you want to split

indices = [x.start() for x in re.finditer("_", string)]
part1 = string[0:indices[occur-1]]
part2 = string[indices[occur-1]+1:]

print (part1, ' ', part2)

Yuval · Answer 4 · 2017-05-12T17:54:22.233

I thought I would contribute my two cents. The second parameter to split() allows you to limit the split after a certain number of strings:

def split_at(s, delim, n):
    r = s.split(delim, n)[n]
    return s[:-len(r)-len(delim)], r

On my machine, the two good answers by @perreal, iterative find and regular expressions, actually measure 1.4 and 1.6 times slower (respectively) than this method.

It's worth noting that it can become even quicker if you don't need the initial bit. Then the code becomes:

def remove_head_parts(s, delim, n):
    return s.split(delim, n)[n]

Not so sure about the naming, I admit, but it does the job. Somewhat surprisingly, it is 2 times faster than iterative find and 3 times faster than regular expressions.

I put up my testing script online. You are welcome to review and comment.

score 1 · Answer 5 · answered Jun 12 '13 at 08:23

>>>import re
>>>str= '20_231_myString_234'

>>> occerence = [m.start() for m in re.finditer('_',str)]  # this will give you a list of '_' position
>>>occerence
[2, 6, 15]
>>>result = [str[:occerence[1]],str[occerence[1]+1:]] # [str[:6],str[7:]]
>>>result
['20_231', 'myString_234']

score 1 · Answer 6 · answered Dec 26 '22 at 14:59

As @Yuval has noted in his answer, and @jamylak commented in his answer, the split and rsplit methods accept a second (optional) parameter maxsplit to avoid making splits beyond what is necessary. Thus, I find the better solution (both for readability and performance) is this:

s = '20_231_myString_234'
first_part = text.rsplit('_', 2)[0] # Gives '20_231'
second_part = text.split('_', 2)[2] # Gives 'myString_234'

This is not only simple, but also avoids performance hits of regex solutions and other solutions using join to undo unnecessary splits.

score 0 · Answer 7 · answered Jun 12 '13 at 07:41

0

It depends what is your pattern for this split. Because if first two elements are always numbers for example, you may build regular expression and use re module. It is able to split your string as well.

answered Jun 12 '13 at 07:41

Michał Fita

1,183
1
7
24

score 0 · Answer 8 · answered Apr 23 '15 at 11:07

I had a larger string to split ever nth character, ended up with the following code:

# Split every 6 spaces
n = 6
sep = ' '
n_split_groups = []

groups = err_str.split(sep)
while len(groups):
    n_split_groups.append(sep.join(groups[:n]))
    groups = groups[n:]

print n_split_groups

Thanks @perreal!

score 0 · Answer 9 · answered Aug 26 '21 at 07:32

In function form of @AllBlackt's solution

def split_nth(s, sep, n):
    n_split_groups = []
    groups = s.split(sep)
    while len(groups):
          n_split_groups.append(sep.join(groups[:n]))
          groups = groups[n:]
    return n_split_groups

s = "aaaaa bbbbb ccccc ddddd eeeeeee ffffffff"
print (split_nth(s, " ", 2))

['aaaaa bbbbb', 'ccccc ddddd', 'eeeeeee ffffffff']

Split string at nth occurrence of a given character

9 Answers9

Linked

Related