0

I'm currently writing a program in python that compresses a file using Huffman coding. Because of python I have been having problems trying to program on such a low level. A problem that I was not able to wrap my head around is decoding (I encode the file I'm reading in a .bin file), because everything I have found so far reads the binary file byte by byte or as hex, I am not able to implement decompression.

So my question is, can I read a binary file in python bit by bit such as to have a variable: binary_text = '0b1000101000...'

  • Does this answer your question? [How to read bits from a file?](https://stackoverflow.com/questions/10689748/how-to-read-bits-from-a-file) – Tomerikoo Jan 01 '21 at 21:36

3 Answers3

2

You also could use bitstream library. It allows to represent bytes as a binary stream:

>>> from bitstream import BitStream
>>> BitStream(b"Hello World!")
010010000110010101101100011011000110111100100000010101110110111101110010011011000110010000100001

You can read and write bits by means of read and write functions, like so:


>>> stream = BitStream()        # <empty>
>>> stream.write(True, bool)    # 1
>>> stream.write(False, bool)   # 10
>>> stream.read(bool, 2)        # <empty>
[True, False]

The documentation and more examples you can find here.

Update:

Another good alternative is bitarray library, implemented in C and providing rich functionality for manipulating bit streams.

alex_noname
  • 26,459
  • 5
  • 69
  • 86
0

You should just be able to open the file in rb (read binary) mode and then use .read on the file handle. Here’s the relevant docs for python’s read: https://docs.python.org/3/library/functions.html#open

with open('my_file.txt', 'rb') as f:
    eight_bytes = f.read(8)

>>> print(eight_bytes)
b'hello wo'
>>> eight_bytes[0]
104
0

Don't know why the "0b" would be useful, but anyway:

import numpy as np
bin = np.fromfile("file.bin", 'u1')
binary_text = "0b"
for byte in bin:
    for mask in [128, 64, 32, 16, 8, 4, 2, 1]:
        binary_text += '1' if byte & mask else '0'
print(binary_text)
Mark Adler
  • 101,978
  • 13
  • 118
  • 158