Convert binary string to bytearray in Python 3

Question

Despite the many related questions, I can't find any that match my problem. I'd like to change a binary string (for example, "0110100001101001") into a byte array (same example, b"hi").

I tried this:

bytes([int(i) for i in "0110100001101001"])

but I got:

b'\x00\x01\x01\x00\x01' #... and so on

What's the correct way to do this in Python 3?

related: [Convert binary to ASCII and vice versa](https://stackoverflow.com/q/7396849/4279) — jfs, Jan 30 '18 at 15:14

score 28 · Accepted Answer · edited Aug 04 '22 at 12:29

28

Here's an example of doing it the first way that Patrick mentioned: convert the bitstring to an int and take 8 bits at a time. The natural way to do that generates the bytes in reverse order. To get the bytes back into the proper order I use extended slice notation on the bytearray with a step of -1: b[::-1].

def bitstring_to_bytes(s):
    v = int(s, 2)
    b = bytearray()
    while v:
        b.append(v & 0xff)
        v >>= 8
    return bytes(b[::-1])

s = "0110100001101001"
print(bitstring_to_bytes(s))

Clearly, Patrick's second way is more compact. :)

However, there's a better way to do this in Python 3: use the int.to_bytes method:

def bitstring_to_bytes(s):
    return int(s, 2).to_bytes((len(s) + 7) // 8, byteorder='big')

If len(s) is guaranteed to be a multiple of 8, then the first arg of .to_bytes can be simplified:

return int(s, 2).to_bytes(len(s) // 8, byteorder='big')

This will raise OverflowError if len(s) is not a multiple of 8, which may be desirable in some circumstances.

Another option is to use double negation to perform ceiling division. For integers a & b, floor division using //

n = a // b

gives the integer n such that
n <= a/b < n + 1
Eg,
47 // 10 gives 4, and

-47 // 10 gives -5. So

-(-47 // 10) gives 5, effectively performing ceiling division.

Thus in bitstring_to_bytes we could do:

return int(s, 2).to_bytes(-(-len(s) // 8), byteorder='big')

However, not many people are familiar with this efficient & compact idiom, so it's generally considered to be less readable than

return int(s, 2).to_bytes((len(s) + 7) // 8, byteorder='big')

edited Aug 04 '22 at 12:29

Antoine Pinsard

33,148
8
67
87

answered Sep 20 '15 at 06:45

PM 2Ring

54,345
6
82
182

5

`len(s) // 8` may fail, [use `(len(s) + 7) // 8` instead](http://stackoverflow.com/a/32683047/4279). – jfs Sep 20 '15 at 18:57
int.to_bytes is essentially the first method -- just done more efficiently in C rather than python. – Patrick Maupin Sep 20 '15 at 21:09
@J.F.Sebastian: Good point; your code is more robust, mine assumes that the input bitstring has been constructed correctly. Another way to calculate the correct size for bitstrings with a length that's not a whole multiple of 8 is to use the "ceiling division" trick: `-(-len(s) // 8)`. – PM 2Ring Sep 21 '15 at 07:24
`'1001'` that corresponds to `b'\t'` is as correct as other bitstrings. The division trick works but it is less readable (some languages round towards zero, some towards negative infinity). – jfs Sep 21 '15 at 07:47
@J.F.Sebastian: 1) Sure, `'1001'` is a valid way to represent that bitstring, and it saves bytes when you're representing bits like that. OTOH, the format used in the OP _is_ zero-padded, which is why I didn't bother handling strings that aren't zero-padded. 2) I agree that the ceiling division trick is less readable, which is why I called it a trick. However, objecting to it on the grounds that different languages round negatives differently is irrelevant here, IMHO, since we're not using other languages, we're using Python. – PM 2Ring Sep 21 '15 at 08:00
Yes, I need the string to be zeropadded. :) – Numeri Sep 21 '15 at 13:10
1

Thank you for your answer! StackOverflow is an amazing resource. This would have taken me a much longer time to work out using docs (and I probably wouldn't have stumbled on the right function). :) – Numeri Sep 21 '15 at 13:18
1

Merci, @Antoine! – PM 2Ring Aug 04 '22 at 12:48

Patrick Maupin · Answer 2 · 2015-09-20T19:09:24.153

11

You have to either convert it to an int and take 8 bits at a time, or chop it into 8 byte long strings and then convert each of them into ints. In Python 3, as PM 2Ring and J.F Sebastian's answers show, the to_bytes() method of int allows you to do the first method very efficiently. This is not available in Python 2, so for people stuck with that, the second method may be more efficient. Here is an example:

>>> s = "0110100001101001"
>>> bytes(int(s[i : i + 8], 2) for i in range(0, len(s), 8))
b'hi'

To break this down, the range statement starts at index 0, and gives us indices into the source string, but advances 8 indices at a time. Since s is 16 characters long, it will give us two indices:

>>> list(range(0, 50, 8))
[0, 8, 16, 24, 32, 40, 48]
>>> list(range(0, len(s), 8))
[0, 8]

(We use list() here to show the values that will be retrieved from the range iterator in Python 3.)

We can then build on this to break the string apart by taking slices of it that are 8 characters long:

>>> [s[i : i + 8] for i in range(0, len(s), 8)]
['01101000', '01101001']

Then we can convert each of those into integers, base 2:

>>> list(int(s[i : i + 8], 2) for i in range(0, len(s), 8))
[104, 105]

And finally, we wrap the whole thing in bytes() to get the answer:

>>> bytes(int(s[i : i + 8], 2) for i in range(0, len(s), 8))
b'hi'

edited Sep 20 '15 at 19:09

answered Sep 20 '15 at 04:22

Patrick Maupin

8,024
2
23
42

@KevinGuan Explanation added. If it meets your needs, please accept the answer. – Patrick Maupin Sep 20 '15 at 05:04
@KevinGuan Sorry, wasn't paying attention! :-) – Patrick Maupin Sep 20 '15 at 05:07
it is unnecessary complicated and inefficient, here's a [simpler solution](http://stackoverflow.com/a/32683047/4279) – jfs Sep 20 '15 at 18:56
@J.F.Sebastian -- excellent point. I'm usually stuck on Python 2 and sometimes forget about Python 3 enhancements. – Patrick Maupin Sep 20 '15 at 19:03
Thanks for this great answer--if anyone wants to solve this with python 2, this is the answer they need. – Numeri Sep 21 '15 at 13:15
Wow! This proposal is working for me since the string of bits (bit array) is too long to process with the standard int() function in Python. Better slicing the array. Thanks!! – mindOf_L Nov 17 '19 at 16:10

score 10 · Answer 3 · answered Sep 20 '15 at 18:54

10

>>> zero_one_string = "0110100001101001"
>>> int(zero_one_string, 2).to_bytes((len(zero_one_string) + 7) // 8, 'big')
b'hi'

It returns bytes object that is an immutable sequence of bytes. If you want to get a bytearray -- a mutable sequence of bytes -- then just call bytearray(b'hi').

answered Sep 20 '15 at 18:54

jfs

399,953
195
994
1,670

Thank you! This is (probably) the safest of all three answers, and most clearly addressed to python3. – Numeri Sep 21 '15 at 13:19

Convert binary string to bytearray in Python 3

3 Answers3

Linked

Related