7

I need to replace \\ with \ in python3 in a complex string. I know that this question had been asked several times, but most of the time for simple strings so that none of the (accepted) answers really works for complex strings.

This is also different from this one where the problem could be solved with .decode('unicode_escape') which does not work for this problem. See below.

Assuming the string is:

my_str = '\\xa5\\xc0\\xe6aK\\xf9\\x80\\xb1\\xc8*\x01\x12$\\xfbp\x1e(4\\xd6{;Z\\x'

Straight forward approach would be:

my_str.replace('\\','\')

which leads to:

SyntaxError: EOL while scanning string literal


This answer suggests using:

my_str.replace('\\\\','\\')

Which results in:

'\\xa5\\xc0\\xe6aK\\xf9\\x80\\xb1\\xc8*\x01\x12$\\xfbp\x1e(4\\xd6{;Z\\x'

So, there is no change.


This answer suggests:

b = bytes(my_str, encoding='utf-8')
b.decode('unicode-escape')

But this doesn't work for such a complex string:

UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 49-50: truncated \xXX escape


Using decode (as suggested here) results in:

my_str.decode('unicode_escape')

AttributeError: 'my_str' object has no attribute 'decode'


A combination of encoding and then decoding using unicode_esacpe returns a totally different string (probably due to using utf-16, but utf-8 results in an error, see above. Also, e.g. latin1 doesn't work):

my_str.encode('utf-16').decode('unicode_escape')
'ÿþ\\\x00x\x00a\x005\x00\\\x00x\x00c\x000\x00\\\x00x\x00e\x006\x00a\x00K\x00\\\x00x\x00f\x009\x00\\\x00x\x008\x000\x00\\\x00x\x00b\x001\x00\\\x00x\x00c\x008\x00*\x00\x01\x00\x12\x00$\x00\\\x00x\x00f\x00b\x00p\x00\x1e\x00(\x004\x00\\\x00x\x00d\x006\x00{\x00;\x00Z\x00\\\x00x\x00'

black
  • 1,151
  • 3
  • 18
  • 46
  • Does `your_text.replace('\\', '')` work? You don't actually have any double literal backslashes there... – Jon Clements May 06 '18 at 12:49
  • Right, this works. As soon as I put '\' as second argument, it doesn't work anymore. – black May 06 '18 at 12:51
  • 2
    I'm going to guess that you don't really need to do this. Often, people print out values and see the double-backslash, but that's just Python's way of unambigiously showing you that there is a single backslash in the string. Can you say more about where this string came from, and why you want to change it? – Ned Batchelder May 06 '18 at 12:55
  • @black Give us more details, we can help fix things properly. Or join us in the #python IRC channel on Freenode, where we can have an actual discussion and get to the bottom of it. – Ned Batchelder May 06 '18 at 13:02
  • "But this doesn't work for such a complex string:" The problem is that the string ends with a single backslash followed by a lowercase x. The described problem is to replace sequences of (backslash, lowercase x, two hex digits) with the corresponding escaped Unicode code point; but *what do you want to do* when the hex digits are missing? – Karl Knechtel Aug 06 '22 at 02:16

2 Answers2

8

Take a closer look at the string, they are all single slash.

In [26]: my_str[0]
Out[26]: '\\'

In [27]: my_str[1]
Out[27]: 'x'

In [28]: len(my_str[0])
Out[28]: 1

And my_str.replace('\\','\') won't work because the token here is \', which escapes ' and waits for the another closing '.
Use my_str.replace('\\', '') instead


Update: after few more days, I realize the following discussion may also be helpful. If the intension of a string with escape ('\\x' or '\\u') are eventually hex/unicode literals, they can be decoded by escape_decode.

import codecs
print(len(b'\x32'), b'\x32')                # 1 hex literal, '\x32' == '2'
print(len(b'\\x32'), b'\\x32')              # 4 chars including escapes
print(codecs.escape_decode('\\x32', 'hex')) # chars->literal, 4->1

# 1 b'2'
# 4 b'\\x32'
# (b'2', 4)

s = '\\xa5\\xc0\\xe6aK\\xf9\\x80\\xb1\\xc8*\x01\x12$\\xfbp\x1e(4\\xd6{;Z'
ed, _ = codecs.escape_decode(s, 'hex')
print(len(s), s)
print(len(ed), ed)

# 49 \xa5\xc0\xe6aK\xf9\x80\xb1\xc8*$\xfbp(4\xd6{;Z
# 22 b'\xa5\xc0\xe6aK\xf9\x80\xb1\xc8*\x01\x12$\xfbp\x1e(4\xd6{;Z'
Blownhither Ma
  • 1,461
  • 8
  • 18
1

If you do

s  = '\\xa5\\xc0\\xe6aK\\xf9\\x80\\xb1\\xc8*\x01\x12$\\xfbp\x1e(4\\xd6{;Z\\x'

s = s.replace('\\','\')

print(s)

you get

 File "main.py", line 3
    s = s.replace('\\','\')
                         ^
SyntaxError: EOL while scanning string literal

because in '\' the \ escapes the ' . Your string is left open.

You do not have any double \ in s - its just displaying it as such, do distinguish it from \ used to escape stuff if you inspect it.

If you print(s) you get \xa5\xc0\xe6aK\xf9\x80\xb1\xc8*$\xfbp(4\xd6{;Z\x

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69