I am working with Python3.2. I need to take a hex stream as an input and parse it at bit-level. So I used
bytes.fromhex(input_str)
to convert the string to actual bytes. Now how do I convert these bytes to bits?
I am working with Python3.2. I need to take a hex stream as an input and parse it at bit-level. So I used
bytes.fromhex(input_str)
to convert the string to actual bytes. Now how do I convert these bytes to bits?
Another way to do this is by using the bitstring
module:
>>> from bitstring import BitArray
>>> input_str = '0xff'
>>> c = BitArray(hex=input_str)
>>> c.bin
'0b11111111'
And if you need to strip the leading 0b
:
>>> c.bin[2:]
'11111111'
The bitstring
module isn't a requirement, as jcollado's answer shows, but it has lots of performant methods for turning input into bits and manipulating them. You might find this handy (or not), for example:
>>> c.uint
255
>>> c.invert()
>>> c.bin[2:]
'00000000'
etc.
What about something like this?
>>> bin(int('ff', base=16))
'0b11111111'
This will convert the hexadecimal string you have to an integer and that integer to a string in which each byte is set to 0/1 depending on the bit-value of the integer.
As pointed out by a comment, if you need to get rid of the 0b
prefix, you can do it this way:
>>> bin(int('ff', base=16))[2:]
'11111111'
... or, if you are using Python 3.9 or newer:
>>> bin(int('ff', base=16)).removepreffix('0b')
'11111111'
Note: using lstrip("0b")
here will lead to 0
integer being converted to an empty string. This is almost always not what you want to do.
Operations are much faster when you work at the integer level. In particular, converting to a string as suggested here is really slow.
If you want bit 7 and 8 only, use e.g.
val = (byte >> 6) & 3
(this is: shift the byte 6 bits to the right - dropping them. Then keep only the last two bits 3
is the number with the first two bits set...)
These can easily be translated into simple CPU operations that are super fast.
using python format string syntax
>>> mybyte = bytes.fromhex("0F") # create my byte using a hex string
>>> binary_string = "{:08b}".format(int(mybyte.hex(),16))
>>> print(binary_string)
00001111
The second line is where the magic happens. All byte objects have a .hex()
function, which returns a hex string. Using this hex string, we convert it to an integer, telling the int()
function that it's a base 16 string (because hex is base 16). Then we apply formatting to that integer so it displays as a binary string. The {:08b}
is where the real magic happens. It is using the Format Specification Mini-Language format_spec
. Specifically it's using the width
and the type
parts of the format_spec syntax. The 8
sets width
to 8, which is how we get the nice 0000 padding, and the b
sets the type to binary.
I prefer this method over the bin()
method because using a format string gives a lot more flexibility.
I think simplest would be use numpy
here. For example you can read a file as bytes and then expand it to bits easily like this:
Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)
input_str = "ABC"
[bin(byte) for byte in bytes(input_str, "utf-8")]
Will give:
['0b1000001', '0b1000010', '0b1000011']
Here how to do it using format()
print "bin_signedDate : ", ''.join(format(x, '08b') for x in bytevector)
It is important the 08b . That means it will be a maximum of 8 leading zeros be appended to complete a byte. If you don't specify this then the format will just have a variable bit length for each converted byte.
Use ord
when reading reading bytes:
byte_binary = bin(ord(f.read(1))) # Add [2:] to remove the "0b" prefix
Or
Using str.format()
:
'{:08b}'.format(ord(f.read(1)))
The other answers here provide the bits in big-endian order ('\x01'
becomes '00000001'
)
In case you're interested in little-endian order of bits, which is useful in many cases, like common representations of bignums etc - here's a snippet for that:
def bits_little_endian_from_bytes(s):
return ''.join(bin(ord(x))[2:].rjust(8,'0')[::-1] for x in s)
And for the other direction:
def bytes_from_bits_little_endian(s):
return ''.join(chr(int(s[i:i+8][::-1], 2)) for i in range(0, len(s), 8))
One line function to convert bytes (not string) to bit list. There is no endnians issue when source is from a byte reader/writer to another byte reader/writer, only if source and target are bit reader and bit writers.
def byte2bin(b):
return [int(X) for X in "".join(["{:0>8}".format(bin(X)[2:])for X in b])]
I came across this answer when looking for a way to convert an integer into a list of bit positions where the bitstring is equal to one. This becomes very similar to this question if you first convert your hex string to an integer like int('0x453', 16)
.
Now, given an integer - a representation already well-encoded in the hardware, I was very surprised to find out that the string variants of the above solutions using things like bin
turn out to be faster than numpy based solutions for a single number, and I thought I'd quickly write up the results.
I wrote three variants of the function. First using numpy:
import math
import numpy as np
def bit_positions_numpy(val):
"""
Given an integer value, return the positions of the on bits.
"""
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
arr = np.frombuffer(bytestr, dtype=np.uint8, count=length)
bit_arr = np.unpackbits(arr, bitorder='big')
bit_positions = np.where(bit_arr[::-1])[0].tolist()
return bit_positions
Then using string logic:
def bit_positions_str(val):
is_negative = val < 0
if is_negative:
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
neg_position = (length * 8) - 1
# special logic for negatives to get twos compliment repr
max_val = 1 << neg_position
val_ = max_val + val
else:
val_ = val
binary_string = '{:b}'.format(val_)[::-1]
bit_positions = [pos for pos, char in enumerate(binary_string)
if char == '1']
if is_negative:
bit_positions.append(neg_position)
return bit_positions
And finally, I added a third method where I precomputed a lookuptable of the positions for a single byte and expanded that given larger itemsizes.
BYTE_TO_POSITIONS = []
pos_masks = [(s, (1 << s)) for s in range(0, 8)]
for i in range(0, 256):
positions = [pos for pos, mask in pos_masks if (mask & i)]
BYTE_TO_POSITIONS.append(positions)
def bit_positions_lut(val):
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
bit_positions = []
for offset, b in enumerate(bytestr[::-1]):
pos = BYTE_TO_POSITIONS[b]
if offset == 0:
bit_positions.extend(pos)
else:
pos_offset = (8 * offset)
bit_positions.extend([p + pos_offset for p in pos])
return bit_positions
The benchmark code is as follows:
def benchmark_bit_conversions():
# for val in [-0, -1, -3, -4, -9999]:
test_values = [
# -1, -2, -3, -4, -8, -32, -290, -9999,
# 0, 1, 2, 3, 4, 8, 32, 290, 9999,
4324, 1028, 1024, 3000, -100000,
999999999999,
-999999999999,
2 ** 32,
2 ** 64,
2 ** 128,
2 ** 128,
]
for val in test_values:
r1 = bit_positions_str(val)
r2 = bit_positions_numpy(val)
r3 = bit_positions_lut(val)
print(f'val={val}')
print(f'r1={r1}')
print(f'r2={r2}')
print(f'r3={r3}')
print('---')
assert r1 == r2
import xdev
xdev.profile_now(bit_positions_numpy)(val)
xdev.profile_now(bit_positions_str)(val)
xdev.profile_now(bit_positions_lut)(val)
import timerit
ti = timerit.Timerit(10000, bestof=10, verbose=2)
for timer in ti.reset('str'):
for val in test_values:
bit_positions_str(val)
for timer in ti.reset('numpy'):
for val in test_values:
bit_positions_numpy(val)
for timer in ti.reset('lut'):
for val in test_values:
bit_positions_lut(val)
for timer in ti.reset('raw_bin'):
for val in test_values:
bin(val)
for timer in ti.reset('raw_bytes'):
for val in test_values:
val.to_bytes(val.bit_length(), 'big', signed=True)
And it clearly shows the str and lookup table implementations are ahead of numpy. I tested this on CPython 3.10 and 3.11.
Timed str for: 10000 loops, best of 10
time per loop: best=20.488 µs, mean=21.438 ± 0.4 µs
Timed numpy for: 10000 loops, best of 10
time per loop: best=25.754 µs, mean=28.509 ± 5.2 µs
Timed lut for: 10000 loops, best of 10
time per loop: best=19.420 µs, mean=21.305 ± 3.8 µs
b = ''.join(f'{z:08b}' for z in x)
Replace ''.join(.)
with [.]
for the bit representations.
This answers preserves the size so each byte takes 8 bits and the output is 8 * nbytes
long.
print(''.join(f'{z:08b}' for z in b'DECADE'))
# output: 010001000100010101000011010000010100010001000101
# len(output) is 48 == len('DECADE') * 8
For Python 3.6+ or newer, you can first convert the hex string to integer using int(input_str, 16)
. Then use f-strings format to convert the integer to bit string.
>>> input_str = b'1a'
>>> f'{int(input_str, 16):b}'
'11010'
The width specifier can be used to set the length of the output bit string if the length of the output is less than the specified width:
>>> f'{int(input_str, 16):08b}'
'00011010'
or
>>> len_in_bits = 8
>>> f'{int(input_str, 16):0{len_in_bits}b}'
'00011010'