0

Alright so I am trying to do a file compress using the Huffman tree.

We got the tree that is working just fine but we are unable to figure out how to write the binary string we get into the file.

So for example our tree returns: '110', it should mean this byte: '00000110' right?

And if the returns: '11111111 11111110' it should mean what? Should we just write it in in byte?

So the question is how do we convert the binary string we get into bytes so we can write it on the file?

Thanks alot, Ara

Ara Sivaneswaran
  • 365
  • 1
  • 10
  • 25
  • possible duplicate of [Binary to text in Java](http://stackoverflow.com/questions/4211705/binary-to-text-in-java) – Smutje Sep 21 '14 at 14:13
  • @Smutje Not really. The other question is how to pass from binary string to readeable text. I am asking how to write my binary string as bytes in a file text. – Ara Sivaneswaran Sep 21 '14 at 14:59
  • So you *know* that you need to write a String to a text file but your are not able to search for yourself? Then SO is clearly not the right place to ask. – Smutje Sep 21 '14 at 15:07
  • @Smutje I have a binary string: '110'. I need to transform it into bytes and then write the bytes into a file so the file will be compressed compared to the original file. It's the part of transforming the binary string into bytes that I don't understand. – Ara Sivaneswaran Sep 21 '14 at 15:09
  • Then guess what the SO search turns out for "java binary string byte": http://stackoverflow.com/questions/17727310/convert-binary-string-to-byte-array – Smutje Sep 21 '14 at 15:14
  • @Smutje Already tried the method but has he is trying to get binary of length 2 and I am not, it's not helping me.... – Ara Sivaneswaran Sep 21 '14 at 15:15
  • No, he uses *Base 2* which is parsing a number using a base of 2 which translates to "binary". – Smutje Sep 21 '14 at 15:17
  • @Smutje "How can I convert this to a byte[] of length 2" Not of base 2 but of length 2. – Ara Sivaneswaran Sep 21 '14 at 15:19
  • Yes, but it's your job to transfer this knowledge or you end up asking these questions forever on SO. – Smutje Sep 21 '14 at 17:17

1 Answers1

1

So for example our tree returns: '110', it should mean this byte: '00000110' right?

Wrong. You should have a byte buffer of bits into which you write your bits. Write the three bits 110 into the byte. (You will need to decide on a convention for bit ordering in the byte.) You still have five unused bits in the byte, so there it sits. Now you write 10 into the buffer. The byte buffer now has 11010, and three unused bits. So still it sits. Now you try to write 111011 into the byte buffer. The first three bits go into the byte buffer, giving you 11010111. You now have filled the buffer, so only now do you write out your byte to the file. You are left with 011. You clear your byte buffer of bits since you wrote it out, and put in the remaining 011 from your last code. Your byte buffer now has three bits in it, and five bits unused. Continue in this manner.

The buffer does not have to be one byte. 16-bit or 32-bit buffers are common and are more efficient. You write out bytes whenever the bits therein are eight or more, and shift the remaining 0-7 bits to the start of the buffer.

The only tricky part is what to do at the end, since you may have unused bits in your last byte. Your Huffman codes should have an end symbol to mark the end of the stream. Then you know when you should stop looking for more Huffman codes. If you do not have an end code, then you need to assure somehow that either the remaining bits in the byte cannot be a complete Huffman code, or you need to indicate in some other way where the stream of bits end.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • Thanks, I think I kinda understand what to do. So basically, I loop through all my binary string and create new byte every 8 'characters' of the binary string I encouter. Then I write all the bytes I made? – Ara Sivaneswaran Sep 21 '14 at 15:21
  • Correct. Note the problem with the last byte, and make sure you have a solution. – Mark Adler Sep 21 '14 at 15:38
  • Yeah, that's what I am trying to figure out. It's not easy as I can have any combination of 1 and 0 used in my Huffman Tree. – Ara Sivaneswaran Sep 21 '14 at 15:42
  • Simply add a symbol to the set of symbols that you are Huffman coding. It occurs once, and is your end-of-stream symbol. – Mark Adler Sep 21 '14 at 19:57