1

What I would like to do is to make specific substitions in a given text. For example, '<' should be changed to '[', '>' to ']', and so forth. It is similar to the solution given here: How can I do multiple substitutions using regex in python?, which is

import re 

def multiple_replace(dict, text):
  # Create a regular expression  from the dictionary keys
  regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))

  # For each match, look-up corresponding value in dictionary
  return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text) 

Now, the problem is that I would also like to replace regex-matched patterns. For example, I want to replace 'fo.+' with 'foo' and 'ba[rz]*' with 'bar'.

Removing the map(re.escape in the code helps, so that the regex actually matches, but I then receive key errors, because, for example, 'barzzzzzz' would be a match, and something I want to replace, but 'barzzzzzz' isn't a key in the dictionary, the literal string 'ba[rz]*' is. How can I modify this function to work?

(On an unrelated note, where do these 'foo' and 'bar' things come from?)

Community
  • 1
  • 1
Firnagzen
  • 1,212
  • 3
  • 13
  • 29
  • Replacing literals is easy to keep straight, but adding regex as match patterns definitely adds some potential ambiguity (esp. since you may have a couple of dozen of them to keep track of). Take care if you have regex that may overlap, that you test and replace them in the intended order. – PaulMcG Jul 11 '13 at 03:14
  • See also: https://stackoverflow.com/questions/66270091/multiple-regex-substitutions-using-a-dict-with-regex-expressions-as-keys – mouwsy Nov 06 '21 at 20:33

2 Answers2

2

Just do multiple sub calls.

On an unrelated note, Jargon File to the rescue: Metasyntactic variables, foo.

Amadan
  • 191,408
  • 23
  • 240
  • 301
  • Multiple sub calls is an option, but not when I've got a few dozen different replacements to make throughout a text. Thanks for the link, though! – Firnagzen Jul 11 '13 at 02:40
  • Just to put things in perspective - you don't like doing a few dozen `sub` calls, but you're okay with doing a few thousand or million `search` calls? @perreal's solution answers your literal question very well, but I still think if you benchmark it, you might find that just doing multiple `sub` calls is both simpler and faster. – Amadan Jul 11 '13 at 02:46
  • ... Huh. I'll try benchmarking it, but I did initially think that this way would be faster. – Firnagzen Jul 11 '13 at 02:58
  • 1
    Well, I've benchmarked it. As it turns out, the multiple_replace function is significantly slower than multiple `re.sub`s, and it seems to scale exponentially with the number of substitutions. – Firnagzen Jul 11 '13 at 06:21
2
import re

def multiple_replace(dict, text):
  # Create a regular expression  from the dictionary keys
  regex = re.compile(r'(%s)' % "|".join(dict.keys()))
  return regex.sub(lambda mo: dict[
      [ k for k in dict if
      re.search(k, mo.string[mo.start():mo.end()])
      ][0]], text)

d = { r'ba[rz]*' : 'bar', '<' : '[' }
s = 'barzzzzzz <'

print multiple_replace(d, s)

Gives:

bar [
perreal
  • 94,503
  • 21
  • 155
  • 181