1

I'm implementing Huffman Algorithm, but when I got the final compressed code, I've got a string similar to below:

10001111010010101010101

This is a binary code to created by the paths of my tree's leafs.

I have this sequence, but when I save it in a file, all that happens is system saving it as a ASCII on a file, which I can't compress because it has the same or bigger size than the original.

How do I save this binary directly?

PS: if I use some function to convert my string to binary, all I got is my ASCII converted to binary, so I did nothing, I need a real solution.

James Mertz
  • 8,459
  • 11
  • 60
  • 87
  • 2
    And how many *bytes* is this supposed to be written as? Big endian, little endian? – Martijn Pieters Aug 08 '14 at 22:05
  • You could use the [Bitwise I/O](http://rosettacode.org/wiki/Bitwise_IO#Python) recipe mentioned in [this](http://stackoverflow.com/a/10691412/355230) answer of mine to another question about reading bits. – martineau Aug 09 '14 at 03:19

2 Answers2

1

What you need to do is take each 8 bits and convert it into a byte to write out, looping until you have less than 8 bits remaining. Then save whatever's left over to prepend in front of the next value.

def binarize(bitstring):
    wholebytes = len(bitstring) // 8
    chars = [chr(int(bitstring[i*8:i*8+8], 2)) for i in range(wholebytes)]
    remainder = bitstring[wholebytes*8:]
    return ''.join(chars), remainder
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
0

I think you just want int() with a base value of 2:

my_string = "10001111010010101010101"
code_num = int( my_string, 2 )

Then write to a binary file. struct.pack additionally allows you to specify whatever byte order you like.

myfile = open("filename.txt",'wb')
mybytes = struct.pack( 'i', code_num )
myfile.write(mybytes)
myfile.close()

This method will also write some number of leading zeros, which could cause trouble for your Huffman codes.

Matt Adams
  • 709
  • 4
  • 11