Split string by first substring found

Question

I wish to split a sentence by certain words, at the first occurrence of those words. Let me illustrate:

message = 'I wish to check my python code for errors to run the program properly with fluency'

I wish to split the above message by the first occurrence of for/to/with, hence the result of the above message would be check my python code for errors to run the program properly with fluency

I also wish to include the word that I split the sentence with, so my final result would be: to check my python code for errors to run the program properly with fluency

My code doesn't work:

import re
message = 'I wish to check my python code for errors to run the program properly with fluency'
result = message.split(r"for|to|with",1)[1]
print(result)

What could I do?

score 1 · Accepted Answer · answered Jul 06 '19 at 19:05

split does not take a regex as a parameter (perhaps you're thinking of Perl).

The following does what you want:

import re
message = 'I wish to check my python code for errors to run the program properly with fluency'
result = re.search(r'\b(for|to|with)\b', message)
print message[result.start(1):]

This does not use substitution, rejoining, or a loop, but only a simple search for the required string and using the positional result of that.

Emma · Answer 2 · 2019-07-06T19:42:37.543

My guess is that this simple expression might simply do that

.*?(\b(?:to|for|with)\b.*)

and re.match might be the fastest one among these five methods:

Test with `re.findall`

import re

regex = r".*?(\b(?:to|for|with)\b.*)"
test_str = "I wish to check my python code for errors to run the program properly with fluency"
print(re.findall(regex, test_str))

Test with `re.sub`

import re

regex = r".*?(\b(?:to|for|with)\b.*)"
test_str = "I wish to check my python code for errors to run the program properly with fluency"
subst = "\\1"

result = re.sub(regex, subst, test_str)

if result:
    print (result)

Test with `re.finditer`

import re

regex = r".*?(\b(?:to|for|with)\b.*)"

test_str = "I wish to check my python code for errors to run the program properly with fluency"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    # FULL MATCH
    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

Test with `re.match`

import re

regex = r".*?(\b(?:to|for|with)\b.*)"
test_str = "I wish to check my python code for errors to run the program properly with fluency"

print(re.match(regex, test_str).group(1))

Test with `re.search`

import re

regex = r".*?(\b(?:to|for|with)\b.*)"
test_str = "I wish to check my python code for errors to run the program properly with fluency"

print(re.search(regex, test_str).group(1))

The expression is explained on the top right panel of this demo, if you wish to explore further or modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

Substituting the entire string like that is inefficient - on my computer, looks to be about three times slower than looking for the first result with search(). — M Somerville, Jul 06 '19 at 19:18

score 0 · Answer 3 · answered Jul 06 '19 at 19:04

message = 'I wish to check my python code for errors to run the program properly with fluency'
array = message.split(' ')
number = 0
message_new = ''
for i in range(len(array)):
    if array[i] == 'to' or array[i] == 'for':
        number=i
        break
for j in range(number,len(array)):
    message_new += array[j] + ' '
print(message_new)

Output:

to check my python code for errors to run the program properly with fluency

score 0 · Answer 4 · answered Jul 06 '19 at 20:06

that question was already answered in: how to remove all characters before a specific character in python but it only works for one specific delimiter, for multiple delimiters you would first have to find out which one occurs first, that can be found here: how can i find the first occurrence of a substring in a python string you start with a first guess, i don't have much imagination so let's call it bestDelimiter = firstDelimiter, find out the position of its first occurrence, save the position to bestPosition = the position of first occurrence, proceed to find out the positions for the rest of the delimiters, each time you find one delimiters that occurs before the current bestPosition you update both variables bestDelimiter and bestPosition, at the end the one that occurs first would be bestDelimiter, then proceed to apply the operation you need using the bestDelimiter

score -1 · Answer 5 · answered Jul 06 '19 at 19:04

You can first find all instances of for, to, and with, split on the desired values, and then splice and rejoin:

import re
message = 'I wish to check my python code for errors to run the program properly with fluency'
vals, [_, *s] = re.findall(r"\bfor\b|\bto\b|\bwith\b", message), re.split(r"\bfor\b|\bto\b|\bwith\b", message)
result = ''.join('{} {}'.format(a, re.sub("^\s+", "", b)) for a, b in zip(vals, s))

Output:

'to check my python code for errors to run the program properly with fluency'

Split string by first substring found

5 Answers5

Test with re.findall

Test with re.sub

Test with re.finditer

Test with re.match

Test with re.search

Test with `re.findall`

Test with `re.sub`

Test with `re.finditer`

Test with `re.match`

Test with `re.search`