0

In Python 3, I have a string like the following:

mystr = "\x00\x00\x01\x01\x80\x02\xc0\x02\x00"

This string was read from a file and it is the bytes representation of some text. To be clear, this is a unicode string, not a bytes object.

I need to transform mystr into a bytes object like the following:

mybytes = b"\x00\x00\x01\x01\x80\x02\xc0\x02\x00"

Notice that the translation should be literal. I don't want to encode the string.

Running .encode('utf-8') will escape the \.

It I manually copy and past the content into a bytes string, then everything works. What I couldn't find anywhere is how could I convert it without copy+paste.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Leo Uieda
  • 203
  • 2
  • 6
  • `bytes(bytearray(ord(i) for i in mystr))` seems to work ... though I feel like there should be a better way. Maybe the better way is to figure out how to not end up in this situation in the first place? :-) – mgilson Jul 13 '16 at 00:39
  • @mgilson thanks! I was thinking about that but this is what I have. Reading the file in `'rb'` gives be `"\\x00\\x00..."`, which is not what I want. Looking for something unrelated I found the solution I posted below. – Leo Uieda Jul 13 '16 at 00:42
  • I ended up deleting my answer because it didn't really work. There were some extra characters being printed in the middle that I hadn't noticed before. – Leo Uieda Jul 13 '16 at 00:53
  • "Running `.encode('utf-8')` will escape the `\`. " No, it won't. There isn't a `\` to escape in the string shown here. If the file actually contains backslashes, lowercase xs etc. then that is a separat problem; and you will see the backslashes be escaped if you view a `repr` of the string, even without changing anything. However, `.encode('utf-8')` **will** corrupt the data (assuming each Unicode code point is intended to represent one byte) by prepending a 0xc2 byte before the 0x80, and 0xc3 before the 0xc0. – Karl Knechtel Aug 05 '22 at 02:49
  • I'm not sure what this question was intended to be, but it's one of these duplicates for sure. – Karl Knechtel Aug 05 '22 at 04:36

1 Answers1

2

mystr.encode("latin-1") is what you want.

Paul Cornelius
  • 9,245
  • 1
  • 15
  • 24