Python convert strings of bytes to byte array

Question

For example given an arbitrary string. Could be chars or just random bytes:

string = '\xf0\x9f\xa4\xb1'

I want to output:

b'\xf0\x9f\xa4\xb1'

This seems so simple, but I could not find an answer anywhere. Of course just typing the b followed by the string will do. But I want to do this runtime, or from a variable containing the strings of byte.

if the given string was AAAA or some known characters I can simply do string.encode('utf-8'), but I am expecting the string of bytes to just be random. Doing that to '\xf0\x9f\xa4\xb1' ( random bytes ) produces unexpected result b'\xc3\xb0\xc2\x9f\xc2\xa4\xc2\xb1'.

There must be a simpler way to do this?

Edit:

I want to convert the string to bytes without using an encoding

Do you want to convert the string to bytes? It is not clear what the desired solution is... if you know it is a byte string without the b, you can do some string formatting. If you need it in bytes, you can call `bytes(string)`. Does this help: https://stackoverflow.com/questions/606191/convert-bytes-to-a-string ? — Scott Skiles, Aug 08 '18 at 20:06
The `bytes` function takes in a `string` and an `encoding`. Since the bytes I'm expecting are random, I don't want to pick an encoding for it — AznBoyStride, Aug 08 '18 at 20:13

tripleee · Answer 1 · 2020-12-28T11:57:50.177

The Latin-1 character encoding trivially (and unlike every other encoding supported by Python) encodes every code point in the range 0x00-0xff to a byte with the same value.

byteobj = '\xf0\x9f\xa4\xb1'.encode('latin-1')

You say you don't want to use an encoding, but the alternatives which avoid it seem far inferior.

The UTF-8 encoding is unsuitable because, as you already discovered, code points above 0x7f map to a sequence of multiple bytes (up to four bytes) none of which are exactly the input code point as a byte value.

Omitting the argument to .encode() (as in a now-deleted answer) forces Python to guess an encoding, which produces system-dependent behavior (probably picks UTF-8 on most systems except Windows, where it will typically instead choose something much more unpredictable, as well as usually much more sinister and horrible).

score 3 · Accepted Answer · answered Aug 08 '18 at 20:26

3

I found a working solution

import struct

def convert_string_to_bytes(string):
    bytes = b''
    for i in string:
        bytes += struct.pack("B", ord(i))
    return bytes

string = '\xf0\x9f\xa4\xb1'

print (convert_string_to_bytes(string)))

output: b'\xf0\x9f\xa4\xb1'

answered Aug 08 '18 at 20:26

AznBoyStride

305
2
12

b'\'\\x1e\\x03\\xcd\\xb6\\x93:\\x87\\xfc\\xcfp\\xfc\\xb7\\xba\\x8a\\x0es\\x81P\\xe1\\x1b\\n4a\\xe4"\\xdfA\\x8e\\x8a\\x15\\x18\\xb8\\x12\\xfcB/\\xea\\x83\\xd4\\x1dd\\xb8\\x14\\xd3\\xb9\\xfa\\x97B\\xfe\\x89\\xe1\\xff\\xbe\\x02\\xedY\\xc9pk\\\'\\xf8\\x1d9\\x1a\'' output is like this – Sadique Khan Nov 11 '21 at 08:23

Python convert strings of bytes to byte array

2 Answers2

Linked