1

In the following content, I want to replace what is inside --START-- / --END-- by a string filelist containing both:

  • \ character
  • newlines (\n)

This code nearly works:

import re

content = """A
--START--
tobereplaced
--END--
C"""

filelist = """c:\\test\\l1.txt
c:\\test\\l2.txt"""

print(re.sub(r'(--START--\n).*?(\n--END--)', r'\1' + re.escape(filelist) + r'\2', 
                  content, flags=re.MULTILINE | re.DOTALL))

but:

  • without re.escape(...), it fails because of the \\l. One solution might be to hack every \ as '\\\\' or r'\\', but it's not really elegant (in my real code, filelist is read from a file produced by another tool)

  • with re.escape(...), then in the output, every newline has a trailing \ and every . becomes \. which I don't want:

    A
    --START--
    c:\test\l1\.txt\
    c:\test\l2\.txt
    --END--
    C
    

How to fix this? and how re.sub(..., r'\1' + repl + r'\2', ...) treat repl as a normal string and no regex pattern?

Desired output:

A
--START--
c:\test\l1.txt
c:\test\l2.txt
--END--
C
Basj
  • 41,386
  • 99
  • 383
  • 673
  • You need to replace `r'\1' + re.escape(filelist) + r'\2'` with `r'\1' + filelist.replace('\\', '\\\\') + r'\2'` – Wiktor Stribiżew Jan 07 '21 at 13:47
  • @WiktorStribiżew There could also be some other characters that risk to be interpreted as a regex pattern in `filelist`, how to avoid this? – Basj Jan 07 '21 at 13:49
  • No, the only one is a ``\``. – Wiktor Stribiżew Jan 07 '21 at 13:49
  • How to be sure @WiktorStribiżew? `filelist` can contain URLs with `&`, `?`, `:`, `/`, and many more characters that could be understood as regex special char, don't you think so? – Basj Jan 07 '21 at 13:50
  • Very well, let them contain anything. The only special character in a substitution pattern is a backslash. Only the backslash must be doubled. And you are good to go. – Wiktor Stribiżew Jan 07 '21 at 13:51
  • Thanks a lot, I upvoted your answer on the duplicate too, it was useful! PS: you could add your last comment to your answer in the duplicate question *"The only special character in a substitution pattern is a backslash. Only the backslash must be doubled. And you are good to go.`*. – Basj Jan 07 '21 at 13:54
  • @WiktorStribiżew Yes this TL;DR would help to get the general idea for future readers :) – Basj Jan 07 '21 at 13:55

0 Answers0