As a part of a bigger project, I want to save a sequence of bits in a file so that the file is as small as possible. I'm not talking about compression, I want to save the sequence as it is but using the least amount of characters. The initial idea was to turn mini-sequences of 8 bits into chars using ASCII encoding and saving those chars, but due to some unknown problem with strange characters, the characters retrieved when reading the file are not the same that were originally written. I've tried opening the file with utf-8 encoding, latin-1 but none seems to work. I'm wondering if there's any other way, maybe by turning the sequence into a hexadecimal number?
-
Mind that ASCII only has 7 bits. The highest bit (bit 8 so to speak) is always set to zero. – Willem Van Onsem Jan 15 '17 at 22:19
-
3Why are you storing them as text at all? Open the file in binary mode. If the bits aren't a multiple of eight, you need an extra byte to describe the number of valid bits in the final byte, but otherwise, you'd just save the raw bytes, not in any specific encoding. – ShadowRanger Jan 15 '17 at 22:21
-
1Yes, you have to use them binary mode. Here's a page that talks about bit arrays in Python: https://wiki.python.org/moin/BitArrays. – Miloslav Číž Jan 15 '17 at 22:22
-
You can use `bytearray` or `bytes` `Python` types as is decribed here: http://stackoverflow.com/questions/18367007/python-how-to-write-to-a-binary-file – SergeyLebedev Jan 15 '17 at 22:24
-
[Here's](http://svn.python.org/projects/python/tags/r12beta3/Demo/classes/bitvec.py) a "demo" mutable bit-vector class, which appears to have been written by Guido. There's also the question [_Write boolean string to binary file?_](http://stackoverflow.com/questions/12672165/write-boolean-string-to-binary-file) – martineau Jan 15 '17 at 22:30
1 Answers
technically you can not write less than a byte because the os organizes memory in bytes (write individual bits to a file in python), so this is binary file io, see https://docs.python.org/2/library/io.html there are modules like struct
open the file with the 'b'
switch, indicates binary read/write operation, then use i.e. the to_bytes()
function (Writing bits to a binary file) or struct.pack()
(How to write individual bits to a text file in python?)
with open('somefile.bin', 'wb') as f:
import struct
>>> struct.pack("h", 824)
'8\x03'
>>> bits = "10111111111111111011110"
>>> int(bits[::-1], 2).to_bytes(4, 'little')
b'\xfd\xff=\x00'
if you want to get around the 8 bit (byte) structure of the memory you can use bit manipulation and techniques like bitmasks and BitArrays see https://wiki.python.org/moin/BitManipulation and https://wiki.python.org/moin/BitArrays
however the problem is, as you said, to read back the data if you use BitArrays of differing length i.e. to store a decimal 7 you need 3 bit 0x111
to store a decimal 2 you need 2 bit 0x10
. now the problem is to read this back.
how can your program know if it has to read the value back as a 3 bit value or as a 2 bit value ? in unorganized memory the sequence decimal 72 looks like 11110
that translates to 111|10
so how can your program know where the |
is ?
in normal byte ordered memory decimal 72 is 0000011100000010
-> 00000111|00000010
this has the advantage that it is clear where the |
is
this is why memory on its lowest level is organized in fixed clusters of 8 bit = 1 byte. if you want to access single bits inside a bytes/ 8 bit clusters you can use bitmasks in combination with logic operators (http://www.learncpp.com/cpp-tutorial/3-8a-bit-flags-and-bit-masks/). in python the easiest way for single bit manipulation is the module ctypes
if you know that your values are all 6 bit maybe it is worth the effort, however this is also tough...
(How do you set, clear, and toggle a single bit?)
(Why can't you do bitwise operations on pointer in C, and is there a way around this?)
-
Thank you for your help, but we tried those methods and what we finished writing in our file was every bit as a char. Maybe the question is not well asked, we want to find the optimum container for these bits in order to be written into a file with the least size possible. Is this possible? We tried taking every 7 bits and transforming into a char, but when we tried to recover the original sequency this lead us to errors... – Julen Cestero Jan 16 '17 at 17:30