How to convert a string containing a byte string to a byte string

Question

How do I convert a string which contains the literal representation of a byte string, to a byte string?

This might seem strange, but for a library I'm using for a certain type of exception I need one of the attributes of the exception, this gives me the value I need, but it is a byte string in a string.

It is "value=b'\\xbbOFa\\x14\\xdb{\\xf5\\x1b~H\\xba\\x96\\xdaec'", I can get the value by splitting on the equals and then using eval, such as

>>> eval("value=b'\\xbbOFa\\x14\\xdb{\\xf5\\x1b~H\\xba\\x96\\xdaec'".split("=")[1])
     
b'\xbbOFa\x14\xdb{\xf5\x1b~H\xba\x96\xdaec'

This works, but as we all know eval can be very, very bad. So, is there an alternative to using eval?

Python 2.4 had a "string_escape" encoder https://docs.python.org/2.4/lib/standard-encodings.html — Mad Physicist, Jul 29 '20 at 14:51
@mkrieger. I've re-opened the question. I don't believe that this is really about eval. eval is just a crutch being used as an example. — Mad Physicist, Jul 29 '20 at 15:44

Mad Physicist · Accepted Answer · 2020-07-29T15:41:12.177

There is a unicode-escape codec that will convert bytes containing literal sequences like \x.. or \u.... into their equivalent characters in the string. The remainder of the string is converted using the latin1 encoding, which just translates all the bytes.

So you convert the string to raw bytes using latin1, then convert back to a string using unicode-escape, and finally back to bytes using latin1 again:

>>> s = '\\xbbOFa\\x14\\xdb{\\xf5\\x1b~H\\xba\\x96\\xdaec'
>>> s.encode('latin1').decode('unicode-escape').encode('latin1')
b'\xbbOFa\x14\xdb{\xf5\x1b~H\xba\x96\xdaec'

Getting rid of the clutter around the string is pretty easy using regex or the more manual parsing you showed. For example:

>>> x = "value=b'\\xbbOFa\\x14\\xdb{\\xf5\\x1b~H\\xba\\x96\\xdaec'"
>>> s = re.fullmatch('[^\'"]+b([\'"])(.*)\\1[^\'"]*', x).group(2)
>>> s
'\\xbbOFa\\x14\\xdb{\\xf5\\x1b~H\\xba\\x96\\xdaec'

OR

>>> s = x.split('=')[1].lstrip('b').strip("'")
>>> s
'\\xbbOFa\\x14\\xdb{\\xf5\\x1b~H\\xba\\x96\\xdaec'

How to convert a string containing a byte string to a byte string

1 Answers1