How to find and replace nth occurrence of word in a sentence using python regular expression?

Question

Using python regular expression only, how to find and replace nth occurrence of word in a sentence? For example:

str = 'cat goose  mouse horse pig cat cow'
new_str = re.sub(r'cat', r'Bull', str)
new_str = re.sub(r'cat', r'Bull', str, 1)
new_str = re.sub(r'cat', r'Bull', str, 2)

I have a sentence above where the word 'cat' appears two times in the sentence. I want 2nd occurence of the 'cat' to be changed to 'Bull' leaving 1st 'cat' word untouched. My final sentence would look like: "cat goose mouse horse pig Bull cow". In my code above I tried 3 different times could not get what I wanted.

I think it's better if you split the string, count for occurrences of `cat` and return a modified list with the `nth` one replaced. Might be a little slower but that might not matter and will definitely be more readable than a hairy regexp. — Noufal Ibrahim, Dec 21 '14 at 12:33

Avinash Raj · Accepted Answer · 2018-12-01T15:30:17.363

17

Use negative lookahead like below.

>>> s = "cat goose  mouse horse pig cat cow"
>>> re.sub(r'^((?:(?!cat).)*cat(?:(?!cat).)*)cat', r'\1Bull', s)
'cat goose  mouse horse pig Bull cow'

DEMO

^ Asserts that we are at the start.
(?:(?!cat).)* Matches any character but not of cat , zero or more times.
cat matches the first cat substring.
(?:(?!cat).)* Matches any character but not of cat , zero or more times.
Now, enclose all the patterns inside a capturing group like ((?:(?!cat).)*cat(?:(?!cat).)*), so that we could refer those captured chars on later.
cat now the following second cat string is matched.

OR

>>> s = "cat goose  mouse horse pig cat cow"
>>> re.sub(r'^(.*?(cat.*?){1})cat', r'\1Bull', s)
'cat goose  mouse horse pig Bull cow'

Change the number inside the {} to replace the first or second or nth occurrence of the string cat

To replace the third occurrence of the string cat, put 2 inside the curly braces ..

>>> re.sub(r'^(.*?(cat.*?){2})cat', r'\1Bull', "cat goose  mouse horse pig cat foo cat cow")
'cat goose  mouse horse pig cat foo Bull cow'

Play with the above regex on here ...

edited Dec 01 '18 at 15:30

answered Dec 21 '14 at 12:29

Avinash Raj

172,303
28
230
274

Hi, what is the advantage of this over using `r'(cat.*?)cat'`? – Pierre Dec 21 '14 at 12:31
So how it deserves a downvote? It's not a wrong answer though. – Avinash Raj Dec 21 '14 at 12:57
@Pierre: Scratch my comment above. Since both of you are using `.`, there should be no difference as far as I can tell. – nhahtdh Dec 21 '14 at 12:57
2

@AvinashRaj: People can downvote because it is an overly complex answer. (The downvote is not mine, btw). – nhahtdh Dec 21 '14 at 12:58
1

This solution doesn't work if n = 1 and there are characters ahead of the first 'cat': https://regex101.com/r/wP7pR2/32 – ForeverWintr Jun 14 '17 at 02:17

score 8 · Answer 2 · edited Dec 17 '21 at 09:28

I use simple function, which lists all occurrences, picks the nth one's position and uses it to split original string into two substrings. Then it replaces first occurrence in the second substring and joins substrings back into the new string:

import re

def replacenth(string, sub, wanted, n):
    where = [m.start() for m in re.finditer(sub, string)][n-1]
    before = string[:where]
    after = string[where:]
    newString = before + after.replace(sub, wanted, 1)
    print newString

For these variables:

string = 'ababababababababab'
sub = 'ab'
wanted = 'CD'
n = 5

outputs:

ababababCDabababab

Notes:

The where variable actually is a list of matches' positions, where you pick up the nth one. But list item index starts with 0 usually, not with 1. Therefore there is a n-1 index and n variable is the actual nth substring. My example finds 5th string. If you use n index and want to find 5th position, you'll need n to be 4. Which you use usually depends on the function, which generates our n.

This should be the simplest way, but it isn't regex only as you originally wanted.

Sources and some links in addition:

where construction: How to find all occurrences of a substring?

string splitting: https://www.daniweb.com/programming/software-development/threads/452362/replace-nth-occurrence-of-any-sub-string-in-a-string

similar question: Find the nth occurrence of substring in a string

Thanks! I think you would need to reassign as follows: `after=after.replace(sub, wanted, 1)`. I don't believe it is changed in place. (also a colon after the function definition) — campo, Feb 19 '21 at 22:00

score 4 · Answer 3 · answered Dec 21 '14 at 12:40

Here's a way to do it without a regex:

def replaceNth(s, source, target, n):
    inds = [i for i in range(len(s) - len(source)+1) if s[i:i+len(source)]==source]
    if len(inds) < n:
        return  # or maybe raise an error
    s = list(s)  # can't assign to string slices. So, let's listify
    s[inds[n-1]:inds[n-1]+len(source)] = target  # do n-1 because we start from the first occurrence of the string, not the 0-th
    return ''.join(s)

Usage:

In [278]: s
Out[278]: 'cat goose  mouse horse pig cat cow'

In [279]: replaceNth(s, 'cat', 'Bull', 2)
Out[279]: 'cat goose  mouse horse pig Bull cow'

In [280]: print(replaceNth(s, 'cat', 'Bull', 3))
None

This is the only answer that worked for my case. – WalksB Oct 29 '21 at 23:27 — WalksB, Oct 29 '21 at 23:27

SomethingSomething · Answer 4 · 2015-04-30T09:12:54.997

I would define a function that will work for every regex:

import re

def replace_ith_instance(string, pattern, new_str, i = None, pattern_flags = 0):
    # If i is None - replacing last occurrence
    match_obj = re.finditer(r'{0}'.format(pattern), string, flags = pattern_flags)
    matches = [item for item in match_obj]
    if i == None:
        i = len(matches)
    if len(matches) == 0 or len(matches) < i:
        return string
    match = matches[i - 1]
    match_start_index = match.start()
    match_len = len(match.group())

    return '{0}{1}{2}'.format(string[0:match_start_index], new_str, string[match_start_index + match_len:])

A working example:

str = 'cat goose  mouse horse pig cat cow'
ns = replace_ith_instance(str, 'cat', 'Bull', 2)
print(ns)

The output:

cat goose  mouse horse pig Bull cow

Another example:

str2 = 'abc abc def abc abc'
ns = replace_ith_instance(str2, 'abc\s*abc', '666')
print(ns)

The output:

abc abc def 666

score 1 · Answer 5 · edited Mar 26 '18 at 05:43

1

How to replace the nth needle with word:

s.replace(needle,'$$$',n-1).replace(needle,word,1).replace('$$$',needle)

edited Mar 26 '18 at 05:43

vindev

2,240
2
13
20

answered Jul 10 '16 at 05:10

chvsanchez

21
2

The question (from 2014) specifically requests the use of a python regular expression, and has an answer which the user accepted - this does not improve upon that answer – Jake Jul 10 '16 at 05:37

score 1 · Answer 6 · answered Nov 22 '21 at 23:25

Just because none of the current answers fitted what I needed: based on aleskva's one:

import re

def replacenth(string, pattern, replacement, n):
    assert n != 0
    matches = list(re.finditer(pattern, string))
    if len(matches) < abs(n) :
        return string
    m = matches[ n-1 if n > 0 else len(matches) + n] 
    return string[0:m.start()] + replacement + string[m.end():]

It accepts negative match numbers ( n = -1 will return the last match), any regex pattern, and it's efficient. If the there are few than n matches, the original string is returned.

This is ideal! I was about to post a similar function before I noticed your answer. The only thing I'd change is to follow the standard from the `re` module. e.g. `def sub_nth(pattern, repl, string, n):` — Bryan Roach, Jun 27 '23 at 07:27

Pierre · Answer 7 · 2014-12-21T13:04:53.553

0

You can match the two occurrences of "cat", keep everything before the second occurrence (\1) and add "Bull":

new_str = re.sub(r'(cat.*?)cat', r'\1Bull', str, 1)

We do only one substitution to avoid replacing the fourth, sixth, etc. occurrence of "cat" (when there are at least four occurrences), as pointed out by Avinash Raj comment.

If you want to replace the n-th occurrence and not the second, use:

n = 2
new_str = re.sub('(cat.*?){%d}' % (n - 1) + 'cat', r'\1Bull', str, 1)

BTW you should not use str as a variable name since it is a Python reserved keyword.

edited Dec 21 '14 at 13:04

answered Dec 21 '14 at 12:28

Pierre

6,047
1
30
49

1

note that op wants to change the second one. Yours would fail if the input is `cat cat cat goose mouse cat` – Avinash Raj Dec 21 '14 at 12:34
then why you used `str` as a variable name? – Avinash Raj Dec 21 '14 at 13:10
@ Avinash Raj: I have used (and not affected) the variable used in the question. – Pierre Dec 21 '14 at 13:12

score 0 · Answer 8 · answered Dec 08 '15 at 04:54

Create a repl function to pass into re.sub(). Except... the trick is to make it a class so you can track the call count.

class ReplWrapper(object):
    def __init__(self, replacement, occurrence):
        self.count = 0
        self.replacement = replacement
        self.occurrence = occurrence
    def repl(self, match):
        self.count += 1
        if self.occurrence == 0 or self.occurrence == self.count:
            return match.expand(self.replacement)
        else:
            try:
                return match.group(0)
            except IndexError:
                return match.group(0)

Then use it like this:

myrepl = ReplWrapper(r'Bull', 0) # replaces all instances in a string
new_str = re.sub(r'cat', myrepl.repl, str)

myrepl = ReplWrapper(r'Bull', 1) # replaces 1st instance in a string
new_str = re.sub(r'cat', myrepl.repl, str)

myrepl = ReplWrapper(r'Bull', 2) # replaces 2nd instance in a string
new_str = re.sub(r'cat', myrepl.repl, str)

I'm sure there is a more clever way to avoid using a class, but this seemed straight-forward enough to explain. Also, be sure to return match.expand() as just returning the replacement value is not technically correct of someone decides to use \1 type templates.

jameshollisandrew · Answer 9 · 2021-05-26T23:57:39.507

I approached this by generating a 'grouped' version of the desired catch pattern relative to the entire string, then applying the sub directly to that instance.

The parent function is regex_n_sub, and collects the same inputs as the re.sub() method.

The catch pattern is passed to get_nsubcatch_catch_pattern() with the instance number. Inside, a list comprehension generates multiples of a pattern '.*? (Match any character, 0 or more repetitions, non-greedy). This pattern will be used to represent the space between pre-nth occurrences of the catch_pattern.

Next, the input catch_pattern is placed between each nth of the 'space pattern' and wrapped with parentheses to form the first group.

The second group is just the catch_pattern wrapped in parentheses - so when the two groups are combined, a pattern for, 'all of the text up to the nth occurrence of the catch pattern is created. This 'new_catch_pattern' has two groups built in, so the second group containing the nth occurence of the catch_pattern can be substituted.

The replace pattern is passed to get_nsubcatch_replace_pattern() and combined with the prefix r'\g<1>' forming a pattern \g<1> + replace_pattern. The \g<1> part of this pattern locates group 1 from the catch pattern, and replaces that group with the text following in the replace pattern.

The code below is verbose only for a clearer understanding of the process flow; it can be reduced as desired.

--

The example below should run stand-alone, and corrects the 4th instance of "I" to "me":

"When I go to the park and I am alone I think the ducks laugh at I but I'm not sure."

with

"When I go to the park and I am alone I think the ducks laugh at me but I'm not sure."

import regex as re

def regex_n_sub(catch_pattern, replace_pattern, input_string, n, flags=0):
    new_catch_pattern, new_replace_pattern = generate_n_sub_patterns(catch_pattern, replace_pattern, n)
    return_string = re.sub(new_catch_pattern, new_replace_pattern, input_string, 1, flags)
    return return_string

def generate_n_sub_patterns(catch_pattern, replace_pattern, n):
    new_catch_pattern = get_nsubcatch_catch_pattern(catch_pattern, n)
    new_replace_pattern = get_nsubcatch_replace_pattern(replace_pattern, n)
    return new_catch_pattern, new_replace_pattern

def get_nsubcatch_catch_pattern(catch_pattern, n):
    space_string = '.*?'
    space_list = [space_string for i in range(n)]
    first_group = catch_pattern.join(space_list)
    first_group = first_group.join('()')
    second_group = catch_pattern.join('()')
    new_catch_pattern = first_group + second_group
    return new_catch_pattern

def get_nsubcatch_replace_pattern(replace_pattern, n):
    new_replace_pattern = r'\g<1>' + replace_pattern
    return new_replace_pattern


### use test ###
catch_pattern = 'I'
replace_pattern = 'me'
test_string = "When I go to the park and I am alone I think the ducks laugh at I but I'm not sure."

regex_n_sub(catch_pattern, replace_pattern, test_string, 4)

This code can be copied directly into a workflow, and will return the replaced object to the regex_n_sub() function call.

Please let me know if implementation fails!

Thanks!

How to find and replace nth occurrence of word in a sentence using python regular expression?

9 Answers9

Linked

Related