I need to chain matching two user supplied patterns where in between matches I replace the captured matched content in the second pattern.
pattern 1 -> match data 1 -> replace captured matches in pattern 2 -> match data2
To do this safely, I need to escape the captured matches, so they don't interfere with second pattern.
For example:
def process(pattern1, pattern2, data1=r'A[1]', data2=r'B[1]'):
result = re.sub(pattern1, pattern2, data1)
return re.match(result, data2)
In this case, process(r'A\[(\d+)]', r'B\[\1]')
will work, while process(r'A(.+)', r'B\1')
will not, because the result
will be B[1]
and the []
will be treated as part of the regex. I don't think I can escape the data1
first, because I don't know what pattern1 will be like.
To make it work, the captured match needs to be escaped first (after pattern1 has matched data1) before substituting in pattern2. This way, result
is B\[1]
, which can then match B[1]
exactly.
Please note that both inputs to process
are assumed to be valid expressions. For example, process(r'[A-Z]+(.*)', r'\1_\2', 'A[1]', 'A_[1]')
.
I looked at using using a function in re.sub
as the second parameter, but that is expected to return a single string, so I am not sure how to deal with the captured groups.
Any suggestions?
EDIT: Just to make it clear, this is not a question about escaping per se, obviously re.escape
can do it. The question how and where would you do the escaping in the sequence being mention in the original question.
EDIT2: The string that needs to be escaped lives in the captured match group. How would you escape that?
EDIT3: Added raw string for prettiness.