Decoding specific escaped characters in a Python string

Question

I have a Python variable (named var) containing a string with the following literal data:

day\r\n\\night

in hex, it is:

64  61  79  5C  72  5C  6E  5C  5C  6E  69  67  68  74  07
d   a   y   \   r   \   n   \   \   n   i   g   h   t   BEL

I need to decode \\, \r and \n only.

The desired output (in hex):

64  61  79  0D  0A  5C  6E  69  67  68  74  07
d   a   y   CR  LF  \   n   i   g   h   t   BEL

Using decode doesn't work:

>>> print(var.decode('ascii'))
AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?

Using regex to find and replace \\, \r and \n with their escaped values is unsuccessful, as the \n in \night is treated as a 0x0A.

Is it possible to specify which characters I want to decode, or is there a more appropriate module? I'm using Python 3.10.2.

@ArthurKing It's in the string, but it does not display properly on the website. The hex dump is to help show the contents of the string on any platform. — leetbacoon, Mar 28 '22 at 06:13
@ArthurKing I did, yes, thank you for posting, however my goal is to have the output not print the ASCII control code names but the bytes themselves, while also leaving all other ASCII control codes alone. I posted an answer which demonstrates this. — leetbacoon, Mar 28 '22 at 07:13

Матвей Рушенцев · Answer 1 · 2022-03-28T06:18:19.610

1

Find similar question here. According to this you can do following

var = r"day\r\n\\night"

# This is what you got previously
var.encode('ascii').hex()
# '64 61 79 5c 72 5c 6e 5c 5c 6e 69 67 68 74'

# To get required output do this
bytes(var, encoding='ascii').decode('unicode-escape').encode('ascii').hex()
# '64 61 79 0d 0a 5c 6e 69 67 68 74'

edited Mar 28 '22 at 06:18

answered Mar 27 '22 at 11:40

Матвей Рушенцев

11
3

Desktop Firework · Answer 2 · 2022-03-27T12:05:27.450

Assuming var is a string like this:

64617905C725C6E5C5C6E69676877407 (without spaces)

you should try:

i = 0
escaped = {'72': '0D', '6E': '0A', '5C': '5C'}
while i < len(var):
   if var[i:i+2] == '5C':                # checks if the caracter is a '\'
      i += 2                             # if yes, goes to next character hex code in var
      var[i-2:i+2] = escaped[var[i:i+2]] # replaces the '5Cxx' by its escaped value
   i += 2

It will replace the \r \n \\ by the characters corresponding (CR LF \).

I'll later add converters between day\r\l\\night and 64617905C725C6E5C5C6E696768774.

EDIT: Converters are here! The converted string is r each time.
It handles the results of input() but for hard-coded strings you'll have to enter:
var = 'day\\r\\l\\\\night'
so that the code will understand it as 'day', then '\', then 'r', then '\', then 'n', then '\', then '\', then 'night' and not 'day', then CR, then LF, then '\', then 'night'; so that upon
print(var)
there will be printed
day\r\n\\night
and not

day
\night

# convert string to hex
r = ''
for c in var:
   t = hex(ord(c))[2:]
   if ord(c) < 16: t = '0' + t
   r += t

# convert hex to string
r = ''
c = 0
while c < len(var):
   # transforms each hex code point into a decimal number
   # I kind of cheat using `eval`. But don't worry. Doesn't matter.
   # anyway, it then adds the corresponding character to `r`.
   r += eval('chr(0x' + var[c:c+2] + ')') # does like, `r += chr(0x5C)` for example.
   c += 2

leetbacoon · Accepted Answer · 2022-03-28T07:15:18.913

Many thanks to everyone that contributed their answers, but none of them seemed to solve my issue completely. After long time of research I found this solution from sahil Kothiya (mirror) -- I modified it to resolve my specific issue:

import re, codecs

ESCAPE_SEQUENCE_RE = re.compile(r'''
    ( \\[\\nr]  # Single-character escapes
    )''', re.UNICODE | re.VERBOSE)

def decode_escapes(s):
    def decode_match(match):
        return codecs.decode(match.group(0), 'unicode-escape')
return ESCAPE_SEQUENCE_RE.sub(decode_match, s)

Demonstration in IDLE: