0

When using a function in re.sub:

import re
def custom_replace(match):
    # how to get the match number here? i.e. 0, 1, 2
    return 'a'
print(re.sub(r'o', custom_replace, "oh hello wow"))

How to get the match number inside custom_replace?

i.e. 0, 1, 2 for the three "o" of the example input string.

NB: I don't want to use a global variable for this, because multiple such operations might happen in different threads etc.

Basj
  • 41,386
  • 99
  • 383
  • 673

3 Answers3

2

Based on @Barmar's answer, I tried this:

import re

def custom_replace(match, matchcount):
    result = 'a' + str(matchcount.i)
    matchcount.i += 1
    return result

def any_request():
    matchcount = lambda: None  # an empty "object", see https://stackoverflow.com/questions/19476816/creating-an-empty-object-in-python/37540574#37540574
    matchcount.i = 0           # benefit : it's a local variable that we pass to custom_replace "as reference
    print(re.sub(r'o', lambda match: custom_replace(match, matchcount), "oh hello wow"))
    # a0h hella1 wa2w

any_request()

and it seems to work.

Reason: I was a bit reluctant to use a global variable for this, because I'm using this inside a web framework, in a route function (called any_request() here).
Let's say there are many requests in parallel (in threads), I don't want a global variable to be "mixed" between different calls (since the operations are probably not atomic?)

Basj
  • 41,386
  • 99
  • 383
  • 673
0

There doesn't seem to be a built-in way. You can use a global variable as a counter.

def custom_replace(match):
    global match_num
    result = 'a' + str(match_num)
    match_num += 1
    return result

match_num = 0
print(re.sub(r'o', custom_replace, "oh hello wow"))

Output is

a0h hella1 wa2w

Don't forget to reset match_num to 0 before each time you call re.sub() with this function.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Or set it to `0` before the call, as I showed above. – Barmar Apr 07 '20 at 18:37
  • Thank you for your answer @Barmar. I was a bit reluctant to use a global variable for this, because I'm using this inside a web framework, in a route request. Let's say there are many requests in parallel (in threads?), I don't want this global variable to be "mixed" between different calls (which are not atomic?). – Basj Apr 07 '20 at 18:40
  • What do you think about this: https://stackoverflow.com/questions/61086537/getting-the-match-number-when-passing-a-function-in-re-sub/61087415#61087415? It's a shame there's no built-in attribute inside `match` to give the match number :) – Basj Apr 07 '20 at 18:58
0

You can use re.search with re.sub.

def count_sub(pattern,text,repl=''):
    count=1
    while re.search(pattern,text):
        text=re.sub(pattern,repl+str(count),text,count=1)
        count+=1
    return text

Output:

count_sub(r'o', 'oh hello world')
# '1h hell2 w3rld'

count_sub(r'o', 'oh hello world','a')
# 'a1h hella2 wa3rld'

Alternative:

def count_sub1(pattern,text,repl=''):
    it=enumerate(re.finditer(pattern,text),1)
    count=1
    while count:
        count,_=next(it,(0,0))
        text=re.sub(pattern,repl+str(count),text,count=1)
    return text

Output:

count_sub1(r'o','oh hello world')
# '1h hell2 w3rld'

count_sub1(r'o','oh hello world','a')
# 'a1h hella2 wa3rld'
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
  • Thanks for your answer. Isn't it a problem to modify `text` while looping on `re.search(pattern, text)`? Isn't this dangerous? – Basj Apr 07 '20 at 19:09
  • @Basj I have no idea TBH. But added another alternative. Not expert in regex. How can it be dangerous? If there's any link or resource point me to it. – Ch3steR Apr 07 '20 at 19:19