By using re.split()
:
>>> re.split(r'(this|into|ones)', "Let's split this string into many small ones")
["Let's split ", 'this', ' string ', 'into', ' many small ', 'ones', '']
By putting the words to split on in a capturing group, the output includes the words we split on.
If you need the spaces removed, use map(str.strip, result)
on the re.split()
output:
>>> map(str.strip, re.split(r'(this|into|ones)', "Let's split this string into many small ones"))
["Let's split", 'this', 'string', 'into', 'many small', 'ones', '']
and you could use filter(None, result)
to remove any empty strings if need be:
>>> filter(None, map(str.strip, re.split(r'(this|into|ones)', "Let's split this string into many small ones")))
["Let's split", 'this', 'string', 'into', 'many small', 'ones']
To split on words but keep them attached to the following group, you need to use a lookahead assertion instead:
>>> re.split(r'\s(?=(?:this|into|ones)\b)', "Let's split this string into many small ones")
["Let's split", 'this string', 'into many small', 'ones']
Now we are really splitting on whitespace, but only on whitespace that is followed by a whole word, one in the set of this
, into
and ones
.