5

Using re module it's possible to use escaping for the replace pattern. eg:

def my_replace(string, src, dst):
    import re
    return re.sub(re.escape(src), dst, string)

While this works for the most-part, the dst string may include "\\9" for example.

This causes an issue:

  • \\1, \\2 ... etc in dst, literals will be interpreted as groups.
  • using re.escape(dst) causes . to be changed to \..

Is there a way to escape the destination without introducing redundant character escaping?


Example usage:

>>> my_replace("My Foo", "Foo", "Bar")
'My Bar'

So far, so good.


>>> my_replace("My Foo", "Foo", "Bar\\Baz")
...
re.error: bad escape \B at position 3

This tries to interpret \B as having a special meaning.


>>> my_replace("My Foo", "Foo", re.escape("Bar\\Baz"))
'My Bar\\Baz'

Works!


>>> my_replace("My Foo", "Foo", re.escape("Bar\\Baz."))
'My Bar\\Baz\\.'

The . gets escaped when we don't want that.


While in this case str.replace can be used, the question about destination string remains useful since there may be times we want to use other features of re.sub such as the ability to ignore case.

ideasman42
  • 42,413
  • 44
  • 197
  • 320
  • 2
    I'm not sure I understand the issue - could you give an example string, src, dst which demonstrates it? – wim Oct 09 '19 at 03:39
  • 1
    Looks like what you really want is `src.replace(r'\', r'\\')` as you don't seem to want `.` be replaced. – metatoaster Oct 09 '19 at 03:51
  • @metatoaster Do you meant `dst` ? - if this avoids all possible interpretations, then yes. – ideasman42 Oct 09 '19 at 03:55
  • @ideasman42 yes. If you only want just this character this would be a way. If you want multiple modifications from this subset, using [`str.translate`](https://docs.python.org/3/library/stdtypes.html#str.translate) may be more desirable. Best approach is to create a number of test cases (add them to your unit test module) to formalise the problem you are trying to solve. – metatoaster Oct 09 '19 at 04:04
  • @ideasman42 Did you get a solution to this without replacing the dst variable. In my case the capture groups are being treated as literals without the re.escape() – Sourav Kanta May 04 '20 at 13:57
  • @metatoaster Your code does not work. Raw strings in Python cannot contain single backslash as the last character. The change of the line in the original function would be: `return re.sub(re.escape(src), dst.replace('\\', r'\\'), string)` – pabouk - Ukraine stay strong May 20 '22 at 08:01
  • @pabouk-Ukrainestaystrong fair, though the demonstration of using `r'\'` was more an illustrative purpose. – metatoaster May 21 '22 at 00:24

3 Answers3

5

In this case only the back-slash is interpreted as a special character, so instead of re.escape, you can use a simple replacement on in destination argument.

def my_replace(string, src, dst):
    import re
    return re.sub(re.escape(src), dst.replace("\\", "\\\\"), string)
ideasman42
  • 42,413
  • 44
  • 197
  • 320
  • Raw strings in Python cannot contain single backslash as the last character. The modified argument would be: `dst.replace("\\", r"\\")` or maybe less confusingly without combining normal and raw strings: `dst.replace("\\", "\\\\")` – pabouk - Ukraine stay strong May 20 '22 at 08:07
  • `r"\\" == "\\\\"` is true here for Python 3.10. – ideasman42 May 20 '22 at 10:17
  • That just supports the second variant in my comment and it should be true in all supported versions. --- I was notifying you about something completely different: *You cannot have a **single (precisely: unpaired)** backslash as the last character of a raw string.* (Paired are fine.) This fails also in Python 3.10 (which started to take advantage of the new PEG parser) --- `>>> sys.version` `'3.10.4 (main, Apr 2 2022, 09:04:19) [GCC 11.2.0]'` `>>> r"\"` ... `SyntaxError: unterminated string literal (detected at line 1)` – pabouk - Ukraine stay strong May 20 '22 at 10:49
  • Good explanation: [Why can't Python's raw string literals end with a single backslash?](https://stackoverflow.com/a/19654184/320437) – pabouk - Ukraine stay strong May 20 '22 at 11:03
  • Ah `r"\"` does indeed fail, thanks - updated answer. – ideasman42 May 20 '22 at 11:31
0

Your code works fine, if you would just remove that re.escape, which I'm not sure why we would have that:

Test 1

import re 

def my_replace(src, dst, string):
    return re.sub(src, dst, string)


string = 'abbbbbb'
src = r'(ab)b+'
dst = r'\1z'

print(my_replace(src, dst, string))

Output 1

abz

Test 2

import re


def my_replace(src, dst, string):
    return re.sub(src, dst, string)


string = re.escape("abbbbbbBar\\Baz")
src = r'(ab)b+'
dst = r'\1z'

print(my_replace(src, dst, string))

Output 2

abzBar\Baz

Test 3

import re


def my_replace(src, dst, string):
    return re.sub(src, dst, string)


string = re.escape("abbbbbbBar\\Baz")
src = r'(ab)b+'
dst = r'\1' + re.escape('\\z')

print(my_replace(src, dst, string))

Output 3

ab\zBar\\Baz

Test 4

To construct the dst, we have to first know if we'd be replacing our string with any capturing groups such as \1 in this case. We cannot re.escape \1, otherwise we would replace our string with \\1, we have to construct the replacement, if there are capturing groups, then append it to any other part that requires re.escaping.

import re


def my_replace(src, dst, string):
    return re.sub(src, dst, string)


string = re.escape("abbbbbbBar\\Baz")
src = r'(ab)b+'
dst = r'\1' + re.escape('\9z')

print(my_replace(src, dst, string))

Output 4

ab\9zBar\\Baz
Emma
  • 27,428
  • 11
  • 44
  • 69
0

You could resort to split:

haystack = r"some text with stu\ff to replace"
needle = r"stu\ff"
replacement = r"foo.bar"

result = replacement.join(re.split(re.escape(needle), haystack))
print(result)

This should also work with needle at the beginning or end of haystack.

blubberdiblub
  • 4,085
  • 1
  • 28
  • 30