0

I need to chain matching two user supplied patterns where in between matches I replace the captured matched content in the second pattern.

pattern 1 -> match data 1 -> replace captured matches in pattern 2 -> match data2

To do this safely, I need to escape the captured matches, so they don't interfere with second pattern.

For example:

def process(pattern1, pattern2, data1=r'A[1]', data2=r'B[1]'):
    result = re.sub(pattern1, pattern2, data1)
    return re.match(result, data2)

In this case, process(r'A\[(\d+)]', r'B\[\1]') will work, while process(r'A(.+)', r'B\1') will not, because the result will be B[1] and the [] will be treated as part of the regex. I don't think I can escape the data1 first, because I don't know what pattern1 will be like.

To make it work, the captured match needs to be escaped first (after pattern1 has matched data1) before substituting in pattern2. This way, result is B\[1], which can then match B[1] exactly.

Please note that both inputs to process are assumed to be valid expressions. For example, process(r'[A-Z]+(.*)', r'\1_\2', 'A[1]', 'A_[1]').

I looked at using using a function in re.sub as the second parameter, but that is expected to return a single string, so I am not sure how to deal with the captured groups.

Any suggestions?

EDIT: Just to make it clear, this is not a question about escaping per se, obviously re.escape can do it. The question how and where would you do the escaping in the sequence being mention in the original question.

EDIT2: The string that needs to be escaped lives in the captured match group. How would you escape that?

EDIT3: Added raw string for prettiness.

David R.
  • 855
  • 8
  • 17
  • Don't forget to use raw strings when you're creating regular expressions in Python. See https://stackoverflow.com/questions/12871066/what-exactly-is-a-raw-string-regex-and-how-can-you-use-it – Barmar Jan 27 '23 at 20:27
  • Use `re.escape()` to escape all special regexp characters in a string, so they'll be treated literally. – Barmar Jan 27 '23 at 20:30
  • This is not about re.escape. This is about where/how would you escape the string, if the string is in the captured match group. However close the question, can you please don't close before reading carefully. Thank you! – David R. Jan 27 '23 at 20:47
  • I think you may need to use a function for the replacement, instead of a string, so you can use `re.escape(match.group(1))` when substituting. But I'm having trouble understanding the whole use. Can you show a specific example of the desired inputs and result? – Barmar Jan 27 '23 at 20:54

0 Answers0