0

I have a string to which I'd like to identify more than 10 possible patterns and replace them without using a For loop

string example =

text = congrats! first recharge of USD 661 is successful & your service is valid till 2019-10-19. dial 0123456789 or click bit.ly/vf_asdqweerw in 46 hours to avail your reward.

expected outcome =

congrats! first recharge of USD <Amount> is successful & your service is valid till <Date>. dial <PhoneNumber> or click <Link> in 46 hours to avail your reward.

I have a dictionary of regex patterns for each value:

dct = {
      r"((http(s?)://)|(bit\\.l)|(www.)).+?(?=[, ]|$)": <Link>,
      r"(\d{2}[-/.])(\w{1,3}|\d{2})[-/.](\d{2,4})\b"  : <Date>,
      .....
}

tried How can I do multiple substitutions using regex in python? but with no success

my current solution uses

for k,v in dct.items():
    text = re.sub(k,v,text)

I need something more scalable.

Talis
  • 283
  • 3
  • 13
  • if you find my answer useful, could you please mark the question as answered (the gray tick on the left of the answer)? – sophros Jan 05 '21 at 08:42

1 Answers1

0

The approach that would best fit your need is to use finite-state transducers. re.sub is like a building block of FST and it would be best for you to use them all efficiently at once.

There is a pynini Python library that is an interface into OpenFST renowned C++ implementation. It is quite difficult to use and requires some training to understand the underlying concepts. A relatively good introduction is this one.

The approach would be roughly something like:

import pynini
mappings = [pynini.transducer(k, v) for k, v in dct.items()]
kvmap = pynini.union(*mappings)

def multi_substitute(in_str):
    return pynini.shortestpath(pynini.compose(in_str, kvmap)).stringify()
sophros
  • 14,672
  • 11
  • 46
  • 75