5

Trying to split the string at number 7 and I want 7 to be included in the second part of the split string.

Code:

a = 'cats can jump up to 7 times their tail length'

words = a.split("7")

print(words)

Output:

['cats can jump up to ', ' times their tail length']

String got split but second part doesn't include 7.

I want to know how I can include 7.

note: not a duplicate of Python split() without removing the delimiter because the separator has to be part of the second string.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
Samyak Jain
  • 53
  • 1
  • 7

6 Answers6

8

A simple and naive way to do this is just to find the index of what you want to split on and slice it:

>>> a = 'cats can jump up to 7 times their tail length'
>>> ind = a.index('7')
>>> a[:ind], a[ind:]
('cats can jump up to ', '7 times their tail length')
Rafael Barros
  • 2,738
  • 1
  • 21
  • 28
  • 1
    Nice trick, but if the delimiter repeats in the main string, only the first occurrence will be considered – Sreeram TP Feb 23 '18 at 15:15
  • 3
    @SreeramTP If the delimiter repeats then that is a requirement the OP should mention in the question. – Graipher Feb 23 '18 at 15:16
5

Another way is to use str.partition:

a = 'cats can jump up to 7 times their tail length'
print(a.partition('7'))
# ('cats can jump up to ', '7', ' times their tail length')

To join the number again with the latter part you can use str.join:

x, *y = a.partition('7')
y = ''.join(y)
print((x, y))
# ('cats can jump up to ', '7 times their tail length')

Or do it manually:

sep = '7'
x = a.split(sep)
x[1] = sep + x[1]
print(tuple(x))
# ('cats can jump up to ', '7 times their tail length')
Graipher
  • 6,891
  • 27
  • 47
5

in one line, using re.split with the rest of the string, and filter the last, empty string that re.split leaves:

import re
a = 'cats can jump up to 7 times their tail length'
print([x for x in re.split("(7.*)",a) if x])

result:

['cats can jump up to ', '7 times their tail length']

using () in split regex tells re.split not to discard the separator. A (7) regex would have worked but would have created a 3-item list like str.partition does, and would have required some post processing, so no one-liner.

now if the number isn't known, regex is (again) the best way to do it. Just change the code to:

[x for x in re.split("(\d.*)",a) if x]
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
1

re can be used to capture globally as well:

>>> s = 'The 7 quick brown foxes jumped 7 times over 7 lazy dogs'
>>> sep = '7'
>>> 
>>> [i for i in re.split(f'({sep}[^{sep}]*)', s) if i]
['The ', '7 quick brown foxes jumped ', '7 times over ', '7 lazy dogs']

If the f-string is hard to read, note that it just evaluates to (7[^7]*).
(To the same end as the listcomp one can use list(filter(bool, ...)), but it's comparatively quite ugly)


In Python 3.7 and onward, re.split() allows splitting on zero-width patterns. This means a lookahead regex, namely f'(?={sep})', can be used instead of the group shown above.

What's strange about this is the timings: if using re.split() (i.e. without a compiled pattern object), the group solution consistently runs about 1.5x faster than the lookahead. However, when compiled, the lookahead beats the other hands-down:

In [4]: r_lookahead = re.compile('f(?={sep})')

In [5]: r_group = re.compile(f'({sep}[^{sep}]*)')

In [6]: %timeit [i for i in r_lookahead.split(s) if i]
2.76 µs ± 207 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: %timeit [i for i in r_group.split(s) if i]
5.74 µs ± 65.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [8]: %timeit [i for i in r_lookahead.split(s * 512) if i]
137 µs ± 1.93 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: %timeit [i for i in r_group.split(s * 512) if i]
1.88 ms ± 18.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

A recursive solution also works fine, although more slowly than splitting on a compiled regex (but faster than a straight re.split(...)):

def splitkeep(s, sep, prefix=''):
    start, delim, end = s.partition(sep)
    return [prefix + start, *(end and splitkeep(end, sep, delim))]
>>> s = 'The 7 quick brown foxes jumped 7 times over 7 lazy dogs'
>>> 
>>> splitkeep(s, '7')
['The ', '7 quick brown foxes jumped ', '7 times over ', '7 lazy dogs']
0

Using enumerate, This only works if the string doesnt start with the seperator

s = 'The quick 7 the brown foxes jumped 7 times over 7 lazy dogs'

separator = '7'
splitted = s.split(separator)

res = [((separator if i > 0 else '') + item).strip() for i, item in enumerate(splitted)]

print(res)
['The quick', '7 the brown foxes jumped', '7 times over', '7 lazy dogs']

[Program finished]
Subham
  • 397
  • 1
  • 6
  • 14
0

There's also the possibility to do all of it using split and list comprehension, without the need to import any library. This will, however, make your code slightly "less pretty":

a = 'cats can jump up to 7 times their tail length'
sep = '7'
splitString = a.split(sep)
splitString = list(splitString[0]) + [sep+x for x in splitString[1:]]

And with that, splitString will carry the value:

['cats can jump up to ', '7 times their tail length']
Pedro Martins de Souza
  • 1,406
  • 1
  • 13
  • 35