Using split function in python3.5

Question

Trying to split the string at number 7 and I want 7 to be included in the second part of the split string.

Code:

a = 'cats can jump up to 7 times their tail length'

words = a.split("7")

print(words)

Output:

['cats can jump up to ', ' times their tail length']

String got split but second part doesn't include 7.

I want to know how I can include 7.

note: not a duplicate of Python split() without removing the delimiter because the separator has to be part of the second string.

score 8 · Answer 1 · answered Feb 23 '18 at 15:10

8

A simple and naive way to do this is just to find the index of what you want to split on and slice it:

>>> a = 'cats can jump up to 7 times their tail length'
>>> ind = a.index('7')
>>> a[:ind], a[ind:]
('cats can jump up to ', '7 times their tail length')

answered Feb 23 '18 at 15:10

Rafael Barros

2,738
1
21
28

1

Nice trick, but if the delimiter repeats in the main string, only the first occurrence will be considered – Sreeram TP Feb 23 '18 at 15:15
3

@SreeramTP If the delimiter repeats then that is a requirement the OP should mention in the question. – Graipher Feb 23 '18 at 15:16

score 5 · Answer 2 · answered Feb 23 '18 at 15:13

Another way is to use str.partition:

a = 'cats can jump up to 7 times their tail length'
print(a.partition('7'))
# ('cats can jump up to ', '7', ' times their tail length')

To join the number again with the latter part you can use str.join:

x, *y = a.partition('7')
y = ''.join(y)
print((x, y))
# ('cats can jump up to ', '7 times their tail length')

Or do it manually:

sep = '7'
x = a.split(sep)
x[1] = sep + x[1]
print(tuple(x))
# ('cats can jump up to ', '7 times their tail length')

Jean-François Fabre · Accepted Answer · 2018-02-23T16:54:38.740

5

in one line, using re.split with the rest of the string, and filter the last, empty string that re.split leaves:

import re
a = 'cats can jump up to 7 times their tail length'
print([x for x in re.split("(7.*)",a) if x])

result:

['cats can jump up to ', '7 times their tail length']

using () in split regex tells re.split not to discard the separator. A (7) regex would have worked but would have created a 3-item list like str.partition does, and would have required some post processing, so no one-liner.

now if the number isn't known, regex is (again) the best way to do it. Just change the code to:

[x for x in re.split("(\d.*)",a) if x]

edited Feb 23 '18 at 16:54

answered Feb 23 '18 at 15:23

Jean-François Fabre

137,073
23
153
219

What if I have a list of strings with a number in them and I have to split each string from the number. Thanks in advance :) – Samyak Jain Feb 23 '18 at 16:52
edited, nothing simpler with regex. Other solutions will need more rework! – Jean-François Fabre Feb 23 '18 at 16:54

score 1 · Answer 4 · 2019-02-02T20:50:41.003

re can be used to capture globally as well:

>>> s = 'The 7 quick brown foxes jumped 7 times over 7 lazy dogs'
>>> sep = '7'
>>> 
>>> [i for i in re.split(f'({sep}[^{sep}]*)', s) if i]
['The ', '7 quick brown foxes jumped ', '7 times over ', '7 lazy dogs']

If the f-string is hard to read, note that it just evaluates to (7[^7]*).
(To the same end as the listcomp one can use list(filter(bool, ...)), but it's comparatively quite ugly)

In Python 3.7 and onward, re.split() allows splitting on zero-width patterns. This means a lookahead regex, namely f'(?={sep})', can be used instead of the group shown above.

What's strange about this is the timings: if using re.split() (i.e. without a compiled pattern object), the group solution consistently runs about 1.5x faster than the lookahead. However, when compiled, the lookahead beats the other hands-down:

In [4]: r_lookahead = re.compile('f(?={sep})')

In [5]: r_group = re.compile(f'({sep}[^{sep}]*)')

In [6]: %timeit [i for i in r_lookahead.split(s) if i]
2.76 µs ± 207 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: %timeit [i for i in r_group.split(s) if i]
5.74 µs ± 65.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [8]: %timeit [i for i in r_lookahead.split(s * 512) if i]
137 µs ± 1.93 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: %timeit [i for i in r_group.split(s * 512) if i]
1.88 ms ± 18.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

A recursive solution also works fine, although more slowly than splitting on a compiled regex (but faster than a straight re.split(...)):

def splitkeep(s, sep, prefix=''):
    start, delim, end = s.partition(sep)
    return [prefix + start, *(end and splitkeep(end, sep, delim))]

>>> s = 'The 7 quick brown foxes jumped 7 times over 7 lazy dogs'
>>> 
>>> splitkeep(s, '7')
['The ', '7 quick brown foxes jumped ', '7 times over ', '7 lazy dogs']

Subham · Answer 5 · 2021-03-31T14:18:16.687

0

Using enumerate, This only works if the string doesnt start with the seperator

s = 'The quick 7 the brown foxes jumped 7 times over 7 lazy dogs'

separator = '7'
splitted = s.split(separator)

res = [((separator if i > 0 else '') + item).strip() for i, item in enumerate(splitted)]

print(res)

['The quick', '7 the brown foxes jumped', '7 times over', '7 lazy dogs']

[Program finished]

edited Mar 31 '21 at 14:18

answered Mar 31 '21 at 13:50

Subham

397
1
6
14

score 0 · Answer 6 · answered Mar 31 '21 at 14:10

There's also the possibility to do all of it using split and list comprehension, without the need to import any library. This will, however, make your code slightly "less pretty":

a = 'cats can jump up to 7 times their tail length'
sep = '7'
splitString = a.split(sep)
splitString = list(splitString[0]) + [sep+x for x in splitString[1:]]

And with that, splitString will carry the value:

['cats can jump up to ', '7 times their tail length']

Using split function in python3.5

6 Answers6