I approached this by generating a 'grouped' version of the desired catch pattern relative to the entire string, then applying the sub directly to that instance.
The parent function is regex_n_sub
, and collects the same inputs as the re.sub()
method.
The catch pattern is passed to get_nsubcatch_catch_pattern()
with the instance number. Inside, a list comprehension generates multiples of a pattern '.*? (Match any character, 0 or more repetitions, non-greedy). This pattern will be used to represent the space between pre-nth occurrences of the catch_pattern.
Next, the input catch_pattern is placed between each nth of the 'space pattern' and wrapped with parentheses to form the first group.
The second group is just the catch_pattern wrapped in parentheses - so when the two groups are combined, a pattern for, 'all of the text up to the nth occurrence of the catch pattern is created. This 'new_catch_pattern' has two groups built in, so the second group containing the nth occurence of the catch_pattern can be substituted.
The replace pattern is passed to get_nsubcatch_replace_pattern()
and combined with the prefix r'\g<1>'
forming a pattern \g<1> + replace_pattern
. The \g<1>
part of this pattern locates group 1 from the catch pattern, and replaces that group with the text following in the replace pattern.
The code below is verbose only for a clearer understanding of the process flow; it can be reduced as desired.
--
The example below should run stand-alone, and corrects the 4th instance of "I" to "me":
"When I go to the park and I am alone I think the ducks laugh at I but I'm not sure."
with
"When I go to the park and I am alone I think the ducks laugh at me but I'm not sure."
import regex as re
def regex_n_sub(catch_pattern, replace_pattern, input_string, n, flags=0):
new_catch_pattern, new_replace_pattern = generate_n_sub_patterns(catch_pattern, replace_pattern, n)
return_string = re.sub(new_catch_pattern, new_replace_pattern, input_string, 1, flags)
return return_string
def generate_n_sub_patterns(catch_pattern, replace_pattern, n):
new_catch_pattern = get_nsubcatch_catch_pattern(catch_pattern, n)
new_replace_pattern = get_nsubcatch_replace_pattern(replace_pattern, n)
return new_catch_pattern, new_replace_pattern
def get_nsubcatch_catch_pattern(catch_pattern, n):
space_string = '.*?'
space_list = [space_string for i in range(n)]
first_group = catch_pattern.join(space_list)
first_group = first_group.join('()')
second_group = catch_pattern.join('()')
new_catch_pattern = first_group + second_group
return new_catch_pattern
def get_nsubcatch_replace_pattern(replace_pattern, n):
new_replace_pattern = r'\g<1>' + replace_pattern
return new_replace_pattern
### use test ###
catch_pattern = 'I'
replace_pattern = 'me'
test_string = "When I go to the park and I am alone I think the ducks laugh at I but I'm not sure."
regex_n_sub(catch_pattern, replace_pattern, test_string, 4)
This code can be copied directly into a workflow, and will return the replaced object to the regex_n_sub()
function call.
Please let me know if implementation fails!
Thanks!