How to replace a double backslash with a single backslash in python?

Question

I have a string. In that string are double backslashes. I want to replace the double backslashes with single backslashes, so that unicode char codes can be parsed correctly.

(Pdb) p fetched_page
'<p style="text-align:center;" align="center"><strong><span style="font-family:\'Times New Roman\', serif;font-size:115%;">Chapter 0<\\/span><\\/strong><\\/p>\n<p><span style="font-family:\'Times New Roman\', serif;font-size:115%;">Chapter 0 in \\u201cDreaming in Code\\u201d give a brief description of programming in its early years and how and why programmers are still struggling today...'

Inside of this string, you can see escaped unicode character codes, such as:

\\u201c

I want to turn this into:

\u201c

Attempt 1:

fetched_page.replace('\\\\', '\\')

but this doesn't work -- it searches for quadruple backslashes.

Attempt 2:

fetched_page.replace('\\', '\')

But this results in an end of line error.

Attempt 3:

fetched_page.decode('string_escape')

But this had no effect on the text. All the double backslashes remained as double backslashes.

Those aren't double backslashes in the string, which is why you can't get rid of them - it's just an artifact of displaying the string. Your problem is bigger than you think. — Mark Ransom, Jul 19 '11 at 18:56
You're right -- the problem was bigger than I had originally thought. My solution involved reworking the way I was extracting data so as to prevent the extra backslashes from getting into my strings in the first place (instead of trying to strip them out after the fact). — zzz, Jul 21 '11 at 12:04

max5555 · Answer 1 · 2019-04-28T09:57:51.687

28

Python3:

>>> b'\\u201c'.decode('unicode_escape')
'“'

or

>>> '\\u201c'.encode().decode('unicode_escape')
'“'

edited Apr 28 '19 at 09:57

answered Apr 28 '19 at 09:49

max5555

495
5
9

1

Thanks for this one, it's the only solution I found to replace every \\ with \ – HenriChab Nov 30 '22 at 15:55

score 27 · Answer 2 · answered Jul 19 '11 at 19:06

27

You can try codecs.escape_decode, this should decode the escape sequences.

answered Jul 19 '11 at 19:06

schlamar

9,238
3
38
76

11

An alternative could be `'mystring'.decode('unicode_escape')` – schlamar Jul 19 '11 at 19:09
5

@ms_py: Doesn't work in Python 3, though; it'd need to be `b'mybytes'.decode('unicode_escape')`. `codecs.unicode_escape_decode` works just fine, though. – JAB Jul 19 '11 at 19:21
3

How do you even use this? – rom May 04 '21 at 03:49

score 16 · Answer 3 · answered Jul 19 '11 at 18:53

16

I'm not getting the behaviour you describe:

>>> x = "\\\\\\\\"
>>> print x
\\\\
>>> y = x.replace('\\\\', '\\')
>>> print y
\\

When you see '\\\\' in your output, you're seeing twice as many slashes as there are in the string because each on is escaped. The code you wrote should work fine. Trying printing out the actual values, instead of only looking at how the REPL displays them.

answered Jul 19 '11 at 18:53

Jeremy

1
85
340
366

11

but the question was "How to replace a double backslash with a single backslash" and not how replace a quadruple with a double backslash! – Apr 09 '19 at 17:08

score 6 · Answer 4 · answered Jul 19 '11 at 19:00

6

To extend on Jeremy's answer, your problem is that '\' is an illegal string because \' escapes the quote mark, so your string never terminates.

answered Jul 19 '11 at 19:00

murgatroid99

19,007
10
60
95

score 2 · Answer 5 · answered Dec 03 '20 at 11:30

Interesting question, but in reality, you have only one slash symbol. It's just a way how it represents in python. If you make a list of symbols which string contains? like:

[s for s in string_object]

it shows every symbol and represents "" as "\", but you don't have to be confused about it. It is the single symbol actually. So, in the case of my example, it's just not a double backslash.

real example:

>>> [s for s in 'usnDu\\NgAnA{I']
['u', 's', 'n', 'D', 'u', '\\', 'N', 'g', 'A', 'n', 'A', '{', 'I']

JAB · Answer 6 · 2011-07-19T19:24:24.817

2

It may be slightly overkill, but...

>>> import re
>>> a = '\\u201c\\u3012'
>>> re.sub(r'\\u[0-9a-fA-F]{4}', lambda x:eval('"' + x.group() + '"'), a)
'“〒'

So yeah, the simplest solution would ms4py's answer, calling codecs.escape_decode on the string and taking the result (or the first element of the result if escape_decode returns a tuple as it seems to in Python 3). In Python 3 you'd want to use codecs.unicode_escape_decode when working with strings (as opposed to bytes objects), though.

edited Jul 19 '11 at 19:24

answered Jul 19 '11 at 19:17

JAB

20,783
6
71
80

Hmm, this seems like a good solution. However, when I run it verbatim on the python 2.7.1 interactive terminal, I get '\\u201c\\u3012' as the result instead of '“〒' – zzz Jul 19 '11 at 20:52

How to replace a double backslash with a single backslash in python?

6 Answers6

Linked