1

For a Python 3 programming assignment I have to work with Huffman coding. It's simple enough to generate the correct codes which result in a long string of 0's and 1's.

Now my problem is actually writings this string of as binary and not as text. I attempted to do this:

result = "01010101 ... " #really long string of 0's and 1's
filewrt = open(output_file, "wb") #appending b to w should write as binary, should it not?
filewrt.write(result)
filewrt.close()

however I'm still geting a large text file of 0 and 1 characters. How do I fix this?

EDIT: It seems as if I just simply don't understand how to represent an arbitrary bit in Python 3.

Based on this SO question I devised this ugly monstrosity:

for char in result: 
    filewrt.write( bytes(int(char, 2)) )

Instead of getting anywhere close to working, it outputted a zero'd file that was twice as large as my input file. Can someone please explain to me how to represent binary arbitrarily? And in the context of creating a huffman tree, how do I go about concatinating or joining bits based on their leaf locations if I should not use a string to do so.

Community
  • 1
  • 1
Niko
  • 4,158
  • 9
  • 46
  • 85
  • 1
    `result` is a Unicode string that happens to contain `0` and `1`. Writing it to a binary output stream is a type error. Are you sure you aren't running it under Python 2? – Mechanical snail Nov 17 '13 at 23:16
  • You need to convert the zeros and ones back to bytes first; Python doesn't do that for you. – Martijn Pieters Nov 17 '13 at 23:19
  • @Mechanicalsnail Pretty sure. I explicitly defined Python3 in aptana and I have been using it for this semester, so hopefully I am. – Niko Nov 17 '13 at 23:45
  • @MartijnPieters I see. Does it take a specific "byte" object or "byte array"? what type of instance do I need to manually convert this string into? – Niko Nov 17 '13 at 23:46
  • Why did you convert to `'1'` and `'0'` characters in the first place? Binary operations are usually easier on bytes instead and don't need manual conversion. – Martijn Pieters Nov 18 '13 at 00:29
  • Well the problem I seem to be having is how I exactly go about representing an arbitrary bit. I've updated my question. If you could help me understand how to manipulate and concatenate arbitrary bits that would be something I would appreciate. – Niko Nov 18 '13 at 01:16
  • 1
    The `bytes` type is used to represent an arbitrary byte array. That could either be a string, in which case `.decode()` will decode it as a unicode string, or just anything else. In any case, you can operate on it using binary operations and access individual bytes using indexing. If you want to store structures as binary objects (e.g. tree nodes), you can also use [`pickle`](http://docs.python.org/3/library/pickle.html) to do the conversion between `bytes` and Python objects. – poke Nov 18 '13 at 01:27
  • Okay, I understand now. That actually makes a ton of sense. The `bytes(n)` method is a constructor for a binary array of length n. That actually makes a lot more sense to me now than it did before. Thank you a – Niko Nov 18 '13 at 01:44

1 Answers1

1
def intToTextBytes(n, stLen=0):
    bs = b''
    while n>0:
        bs = bytes([n & 0xff]) + bs
        n >>= 8
    return bs.rjust(stLen, b'\x00')


num = 0b01010101111111111111110000000000000011111111111111
bs = intToTextBytes(num)
print(bs)
open(output_file, "wb").write(bs)

EDIT: A more complicated, but faster (about 3 times) way:

from math import log, ceil
intToTextBytes = lambda n, stLen=0: bytes([
    (n >> (i<<3)) & 0xff for i in range(int(ceil(log(n, 256)))-1, -1, -1)
]).rjust(stLen, b'\x00')
saeedgnu
  • 4,110
  • 2
  • 31
  • 48