2

I have a string of booleans and I want to create a binary file using these booleans as bits. This is what I am doing:

# first append the string with 0s to make its length a multiple of 8
while len(boolString) % 8 != 0:
    boolString += '0'

# write the string to the file byte by byte
i = 0
while i < len(boolString) / 8:
    byte = int(boolString[i*8 : (i+1)*8], 2)
    outputFile.write('%c' % byte)

    i += 1

But this generates the output 1 byte at a time and is slow. What would be a more efficient way to do it?

martineau
  • 119,623
  • 25
  • 170
  • 301
Jayanth Koushik
  • 9,476
  • 1
  • 44
  • 52

6 Answers6

2

It should be quicker if you calculate all your bytes first and then write them all together. For example

b = bytearray([int(boolString[x:x+8], 2) for x in range(0, len(boolString), 8)])
outputFile.write(b)

I'm also using a bytearray which is a natural container to use, and can also be written directly to your file.


You can of course use libraries if that's appropriate such as bitarray and bitstring. Using the latter you could just say

bitstring.Bits(bin=boolString).tofile(outputFile)
Scott Griffiths
  • 21,438
  • 8
  • 55
  • 85
2

Here's another answer, this time using an industrial-strength utility function from the PyCrypto - The Python Cryptography Toolkit where, in version 2.6 (the current latest stable release), it's defined inpycrypto-2.6/lib/Crypto/Util/number.py.

The comments preceeding it say:
    Improved conversion functions contributed by Barry Warsaw, after careful benchmarking

import struct

def long_to_bytes(n, blocksize=0):
    """long_to_bytes(n:long, blocksize:int) : string
    Convert a long integer to a byte string.

    If optional blocksize is given and greater than zero, pad the front of the
    byte string with binary zeros so that the length is a multiple of
    blocksize.
    """
    # after much testing, this algorithm was deemed to be the fastest
    s = b('')
    n = long(n)
    pack = struct.pack
    while n > 0:
        s = pack('>I', n & 0xffffffffL) + s
        n = n >> 32
    # strip off leading zeros
    for i in range(len(s)):
        if s[i] != b('\000')[0]:
            break
    else:
        # only happens when n == 0
        s = b('\000')
        i = 0
    s = s[i:]
    # add back some pad bytes.  this could be done more efficiently w.r.t. the
    # de-padding being done above, but sigh...
    if blocksize > 0 and len(s) % blocksize:
        s = (blocksize - len(s) % blocksize) * b('\000') + s
    return s
martineau
  • 119,623
  • 25
  • 170
  • 301
1

You can try this code using the array class:

import array

buffer = array.array('B')

i = 0
while i < len(boolString) / 8:
    byte = int(boolString[i*8 : (i+1)*8], 2)
    buffer.append(byte)
    i += 1

f = file(filename, 'wb')
buffer.tofile(f)
f.close()
Samy Arous
  • 6,794
  • 13
  • 20
1

You can convert a boolean string to a long using data = long(boolString,2). Then to write this long to disk you can use:

while data > 0:
    data, byte = divmod(data, 0xff)
    file.write('%c' % byte)

However, there is no need to make a boolean string. It is much easier to use a long. The long type can contain an infinite number of bits. Using bit manipulation you can set or clear the bits as needed. You can then write the long to disk as a whole in a single write operation.

Hans Then
  • 10,935
  • 3
  • 32
  • 51
  • The problem with using a `long` is that it ignores any leading `0` bits. – Scott Griffiths Oct 01 '12 at 11:35
  • It depends. If the length of the boolean string is known by the program that reads the file, this is not a problem. This appears to be the case, as otherwise, how would the reading program know how many bytes to read. – Hans Then Oct 01 '12 at 11:43
  • It would either take the length from the file and read all of it, or know in advance how many bytes to read. If it reads all of the file then my argument stands, otherwise the problem with a long is that the program would have to know how many bytes it was stored in as well as how many bytes it would be after the leading zeros had been reinserted. You essentially have to store the length along with the long, which is then a simple codec rather than just storing the data. – Scott Griffiths Oct 01 '12 at 11:50
  • If it reads all of the file then it also does not matter, since the data will be the same no matter how many bytes you read. – Hans Then Oct 01 '12 at 11:53
  • It doesn't read all from the file. But there is a header in the file indicating how many bits are to be read. – Jayanth Koushik Oct 01 '12 at 12:18
  • If it makes any difference, the context of the problem is huffman compression. So the boolean string came 'naturally' from the tree. – Jayanth Koushik Oct 01 '12 at 12:20
1

A helper class (shown below) makes it easy:

class BitWriter:
    def __init__(self, f):
        self.acc = 0
        self.bcount = 0
        self.out = f

    def __del__(self):
        self.flush()

    def writebit(self, bit):
        if self.bcount == 8 :
            self.flush()
        if bit > 0:
            self.acc |= (1 << (7-self.bcount))
        self.bcount += 1

    def writebits(self, bits, n):
        while n > 0:
            self.writebit( bits & (1 << (n-1)) )
            n -= 1

    def flush(self):
        self.out.write(chr(self.acc))
        self.acc = 0
        self.bcount = 0

with open('outputFile', 'wb') as f:
    bw = BitWriter(f)
    bw.writebits(int(boolString,2), len(boolString))
    bw.flush()
martineau
  • 119,623
  • 25
  • 170
  • 301
0

Use the struct package.

This can be used in handling binary data stored in files or from network connections, among other sources.

Edit:

An example using ? as the format character for a bool.

import struct

p = struct.pack('????', True, False, True, False)
assert p == '\x01\x00\x01\x00'
with open("out", "wb") as o:
    o.write(p)

Let's take a look at the file:

$ ls -l out
-rw-r--r-- 1 lutz lutz 4 Okt  1 13:26 out
$ od out
0000000 000001 000001
000000

Read it in again:

with open("out", "rb") as i:
    q = struct.unpack('????', i.read())
assert q == (True, False, True, False)
  • 1
    I'm sorry, but I'm not sure I follow. I looked up struct and it is used to pack/unpack things according to some format. But I don't have a format. They are raw bits and I want to put them in the file as is. – Jayanth Koushik Oct 01 '12 at 11:23
  • Don't \x00 \x01 represent a byte each? But i want the true to genrate 1 bit. So my output file should have been 10100000 (the last four bits are pads) – Jayanth Koushik Oct 01 '12 at 12:22