48

I am working with Python3.2. I need to take a hex stream as an input and parse it at bit-level. So I used

bytes.fromhex(input_str)

to convert the string to actual bytes. Now how do I convert these bytes to bits?

user904832
  • 557
  • 1
  • 6
  • 10
  • 2
    Bytes are bits, just 8 at a time ;) - The answer depends on what you want to do, please be more specific Also bit-manipulation is mostly done on byte level... – Martin Thurau Jan 11 '12 at 07:27
  • 1
    I want to represent the bytes in the form a bit string so that I can do something like: field1 = bit_string[0:1] field2 = bit_string[1:16] and so on – user904832 Jan 11 '12 at 07:31
  • 1
    Confusing title. Hexadecimals are nothing to do with bytes. Title should be: "Convert hexadecimals to bits in python" – illuminato Feb 13 '22 at 16:30

14 Answers14

49

Another way to do this is by using the bitstring module:

>>> from bitstring import BitArray
>>> input_str = '0xff'
>>> c = BitArray(hex=input_str)
>>> c.bin
'0b11111111'

And if you need to strip the leading 0b:

>>> c.bin[2:]
'11111111'

The bitstring module isn't a requirement, as jcollado's answer shows, but it has lots of performant methods for turning input into bits and manipulating them. You might find this handy (or not), for example:

>>> c.uint
255
>>> c.invert()
>>> c.bin[2:]
'00000000'

etc.

Epoc
  • 7,208
  • 8
  • 62
  • 66
Alex Reynolds
  • 95,983
  • 54
  • 240
  • 345
  • 5
    +1. And for the latest version of bitstring (3.0) you don't need to strip the leading `0b`. – Major Major Jan 11 '12 at 09:00
  • I have found that, for most cases, bitstring has far worse performance than python builtins like `struct.pack/unpack` – JeremyKun Dec 11 '22 at 19:58
  • It will likely help others who come to this question for you to show examples of how those builtins work. If you have time to post an answer, please do. – Alex Reynolds Dec 12 '22 at 17:56
36

What about something like this?

>>> bin(int('ff', base=16))
'0b11111111'

This will convert the hexadecimal string you have to an integer and that integer to a string in which each byte is set to 0/1 depending on the bit-value of the integer.

As pointed out by a comment, if you need to get rid of the 0b prefix, you can do it this way:

>>> bin(int('ff', base=16))[2:]
'11111111'

... or, if you are using Python 3.9 or newer:

>>> bin(int('ff', base=16)).removepreffix('0b')
'11111111'

Note: using lstrip("0b") here will lead to 0 integer being converted to an empty string. This is almost always not what you want to do.

Błażej Michalik
  • 4,474
  • 40
  • 55
jcollado
  • 39,419
  • 8
  • 102
  • 133
  • lstrip('-0b') # remove leading zeros and minus sign – ahoffer Jan 11 '12 at 07:35
  • @ahoffer Thanks for your comment. I've updated my answer to let the OP know how to remove the `0b` prefix. – jcollado Jan 11 '12 at 07:39
  • 11
    Note that `lstrip('0b')` will also remove, say, `00bb` since the argument to `lstrip` is a *set* of characters to remove. It'll work fine in this case, but I prefer the `[2:]` solution since it's more explicit. – Martin Geisler Jan 11 '12 at 07:45
  • @MartinGeisler Yes, `bin` leading zeros are already removed when converting to an integer, but it's worth to note that `lstrip` removes a set of characters, not a prefix. – jcollado Jan 11 '12 at 07:50
  • **Do not use `lstrip` for this! It will remove ALL leading zeroes, so this will convert 0 integer to an empty string!** The only valid way of doing this without indexing would be using `str.removeprefix()`. – Błażej Michalik Dec 22 '22 at 02:53
33

Operations are much faster when you work at the integer level. In particular, converting to a string as suggested here is really slow.

If you want bit 7 and 8 only, use e.g.

val = (byte >> 6) & 3

(this is: shift the byte 6 bits to the right - dropping them. Then keep only the last two bits 3 is the number with the first two bits set...)

These can easily be translated into simple CPU operations that are super fast.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
16

using python format string syntax

>>> mybyte = bytes.fromhex("0F") # create my byte using a hex string
>>> binary_string = "{:08b}".format(int(mybyte.hex(),16))
>>> print(binary_string)
00001111

The second line is where the magic happens. All byte objects have a .hex() function, which returns a hex string. Using this hex string, we convert it to an integer, telling the int() function that it's a base 16 string (because hex is base 16). Then we apply formatting to that integer so it displays as a binary string. The {:08b} is where the real magic happens. It is using the Format Specification Mini-Language format_spec. Specifically it's using the width and the type parts of the format_spec syntax. The 8 sets width to 8, which is how we get the nice 0000 padding, and the b sets the type to binary.

I prefer this method over the bin() method because using a format string gives a lot more flexibility.

ZenCodr
  • 1,176
  • 8
  • 12
  • but this method doesn't let you take a variable number of bytes as input, right? you need to hard-code how long the final binary string needs to be. – Nathan Wailes Aug 26 '20 at 11:49
12

I think simplest would be use numpy here. For example you can read a file as bytes and then expand it to bits easily like this:

Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)
Mikhail V
  • 1,416
  • 1
  • 14
  • 23
6
input_str = "ABC"
[bin(byte) for byte in bytes(input_str, "utf-8")]

Will give:

['0b1000001', '0b1000010', '0b1000011']
AJP
  • 26,547
  • 23
  • 88
  • 127
5

Here how to do it using format()

print "bin_signedDate : ", ''.join(format(x, '08b') for x in bytevector)

It is important the 08b . That means it will be a maximum of 8 leading zeros be appended to complete a byte. If you don't specify this then the format will just have a variable bit length for each converted byte.

Joniale
  • 515
  • 4
  • 17
4

Use ord when reading reading bytes:

byte_binary = bin(ord(f.read(1))) # Add [2:] to remove the "0b" prefix

Or

Using str.format():

'{:08b}'.format(ord(f.read(1)))
Jacob Valenta
  • 6,659
  • 8
  • 31
  • 42
4

To binary:

bin(byte)[2:].zfill(8)
Ferguzz
  • 5,777
  • 7
  • 34
  • 41
1

The other answers here provide the bits in big-endian order ('\x01' becomes '00000001')

In case you're interested in little-endian order of bits, which is useful in many cases, like common representations of bignums etc - here's a snippet for that:

def bits_little_endian_from_bytes(s):
    return ''.join(bin(ord(x))[2:].rjust(8,'0')[::-1] for x in s)

And for the other direction:

def bytes_from_bits_little_endian(s):
    return ''.join(chr(int(s[i:i+8][::-1], 2)) for i in range(0, len(s), 8))
yairchu
  • 23,680
  • 7
  • 69
  • 109
0

One line function to convert bytes (not string) to bit list. There is no endnians issue when source is from a byte reader/writer to another byte reader/writer, only if source and target are bit reader and bit writers.

def byte2bin(b):
    return [int(X) for X in "".join(["{:0>8}".format(bin(X)[2:])for X in b])]
user6830669
  • 161
  • 4
0

I came across this answer when looking for a way to convert an integer into a list of bit positions where the bitstring is equal to one. This becomes very similar to this question if you first convert your hex string to an integer like int('0x453', 16).

Now, given an integer - a representation already well-encoded in the hardware, I was very surprised to find out that the string variants of the above solutions using things like bin turn out to be faster than numpy based solutions for a single number, and I thought I'd quickly write up the results.

I wrote three variants of the function. First using numpy:

import math
import numpy as np
def bit_positions_numpy(val):
    """
    Given an integer value, return the positions of the on bits.
    """
    bit_length = val.bit_length() + 1
    length = math.ceil(bit_length / 8.0)  # bytelength
    bytestr = val.to_bytes(length, byteorder='big', signed=True)
    arr = np.frombuffer(bytestr, dtype=np.uint8, count=length)
    bit_arr = np.unpackbits(arr, bitorder='big')
    bit_positions = np.where(bit_arr[::-1])[0].tolist()
    return bit_positions

Then using string logic:

def bit_positions_str(val):
    is_negative = val < 0
    if is_negative:
        bit_length = val.bit_length() + 1
        length = math.ceil(bit_length / 8.0)  # bytelength
        neg_position = (length * 8) - 1
        # special logic for negatives to get twos compliment repr
        max_val = 1 << neg_position
        val_ = max_val + val
    else:
        val_ = val
    binary_string = '{:b}'.format(val_)[::-1]
    bit_positions = [pos for pos, char in enumerate(binary_string)
                     if char == '1']
    if is_negative:
        bit_positions.append(neg_position)
    return bit_positions

And finally, I added a third method where I precomputed a lookuptable of the positions for a single byte and expanded that given larger itemsizes.

BYTE_TO_POSITIONS = []
pos_masks = [(s, (1 << s)) for s in range(0, 8)]
for i in range(0, 256):
    positions = [pos  for pos, mask in pos_masks if (mask & i)]
    BYTE_TO_POSITIONS.append(positions)


def bit_positions_lut(val):
    bit_length = val.bit_length() + 1
    length = math.ceil(bit_length / 8.0)  # bytelength
    bytestr = val.to_bytes(length, byteorder='big', signed=True)
    bit_positions = []
    for offset, b in enumerate(bytestr[::-1]):
        pos = BYTE_TO_POSITIONS[b]
        if offset == 0:
            bit_positions.extend(pos)
        else:
            pos_offset = (8 * offset)
            bit_positions.extend([p + pos_offset for p in pos])
    return bit_positions

The benchmark code is as follows:

def benchmark_bit_conversions():
    # for val in [-0, -1, -3, -4, -9999]:

    test_values = [
        # -1, -2, -3, -4, -8, -32, -290, -9999,
        # 0, 1, 2, 3, 4, 8, 32, 290, 9999,
        4324, 1028, 1024, 3000, -100000,
        999999999999,
        -999999999999,
        2 ** 32,
        2 ** 64,
        2 ** 128,
        2 ** 128,
    ]

    for val in test_values:
        r1 = bit_positions_str(val)
        r2 = bit_positions_numpy(val)
        r3 = bit_positions_lut(val)
        print(f'val={val}')
        print(f'r1={r1}')
        print(f'r2={r2}')
        print(f'r3={r3}')
        print('---')
        assert r1 == r2

    import xdev
    xdev.profile_now(bit_positions_numpy)(val)
    xdev.profile_now(bit_positions_str)(val)
    xdev.profile_now(bit_positions_lut)(val)

    import timerit
    ti = timerit.Timerit(10000, bestof=10, verbose=2)
    for timer in ti.reset('str'):
        for val in test_values:
            bit_positions_str(val)

    for timer in ti.reset('numpy'):
        for val in test_values:
            bit_positions_numpy(val)

    for timer in ti.reset('lut'):
        for val in test_values:
            bit_positions_lut(val)

    for timer in ti.reset('raw_bin'):
        for val in test_values:
            bin(val)

    for timer in ti.reset('raw_bytes'):
        for val in test_values:
            val.to_bytes(val.bit_length(), 'big', signed=True)

And it clearly shows the str and lookup table implementations are ahead of numpy. I tested this on CPython 3.10 and 3.11.

Timed str for: 10000 loops, best of 10
    time per loop: best=20.488 µs, mean=21.438 ± 0.4 µs
Timed numpy for: 10000 loops, best of 10
    time per loop: best=25.754 µs, mean=28.509 ± 5.2 µs
Timed lut for: 10000 loops, best of 10
    time per loop: best=19.420 µs, mean=21.305 ± 3.8 µs
Erotemic
  • 4,806
  • 4
  • 39
  • 80
0

Q: How convert bytes to bits / string of bits?

A:

b = ''.join(f'{z:08b}' for z in x)

Replace ''.join(.) with [.] for the bit representations. This answers preserves the size so each byte takes 8 bits and the output is 8 * nbytes long.

Example:

print(''.join(f'{z:08b}' for z in b'DECADE'))
# output: 010001000100010101000011010000010100010001000101
# len(output) is 48 == len('DECADE') * 8
Hunaphu
  • 589
  • 10
  • 11
0

For Python 3.6+ or newer, you can first convert the hex string to integer using int(input_str, 16). Then use f-strings format to convert the integer to bit string.

>>> input_str = b'1a'
>>> f'{int(input_str, 16):b}'
'11010'

The width specifier can be used to set the length of the output bit string if the length of the output is less than the specified width:

>>> f'{int(input_str, 16):08b}'
'00011010'

or

>>> len_in_bits = 8
>>> f'{int(input_str, 16):0{len_in_bits}b}'
'00011010'
Kent
  • 1
  • 2