There are some nice ways to handle simultaneous multi-string replacement in python. However, I am having trouble creating an efficient function that can do that while also supporting backreferences.
What i would like is to use a dictionary of expression / replacement terms, where the replacement terms may contain backreferences to something matched by the expression.
e.g. (note the \1)
repdict = {'&&':'and', '||':'or', '!([a-zA-Z_])':'not \1'}
I put the SO answer mentioned at the outset into the function below, which works fine for expression / replacement pairs that don't contain backreferences:
def replaceAll(repdict, text):
repdict = dict((re.escape(k), v) for k, v in repdict.items())
pattern = re.compile("|".join(repdict.keys()))
return pattern.sub(lambda m: repdict[re.escape(m.group(0))], text)
However, it doesn't work for the key that does contain a backreference..
>>> replaceAll(repldict, "!newData.exists() || newData.val().length == 1")
'!newData.exists() or newData.val().length == 1'
If i do it manually, it works fine. e.g.:
pattern = re.compile("!([a-zA-Z_])")
pattern.sub(r'not \1', '!newData.exists()')
Works as expected:
'not newData.exists()'
In the fancy function, the escaping seems to be messing up the key that uses the backref, so it never matches anything.
I eventually came up with this. However, note that the problem of supporting backrefs in the input parameters is not solved, i'm just handling it manually in the replacer function:
def replaceAll(repPat, text):
def replacer(obj):
match = obj.group(0)
# manually deal with exclamation mark match..
if match[:1] == "!": return 'not ' + match[1:]
# here we naively escape the matched pattern into
# the format of our dictionary key
else: return repPat[naive_escaper(match)]
pattern = re.compile("|".join(repPat.keys()))
return pattern.sub(replacer, text)
def naive_escaper(string):
if '=' in string: return string.replace('=', '\=')
elif '|' in string: return string.replace('|', '\|')
else: return string
# manually escaping \ and = works fine
repPat = {'!([a-zA-Z_])':'', '&&':'and', '\|\|':'or', '\=\=\=':'=='}
replaceAll(repPat, "(!this && !that) || !this && foo === bar")
Returns:
'(not this and not that) or not this'
So if anyone has an idea how to make a multi-string replacement function that supports backreferences and accepts the replacement terms as input, I'd appreciate your feedback very much.