0

I'm implementing a huffman library, and for now I have the results of encoding as strings. I want to serialize the tree along with the message, and my result is kinda this:

Message: aabbccdd
Tree output: 001011000011011000100101100011101100100
Encoded message: 0000010110101111
Serialized message (tree + message): 0010110000110110001001011000111011001000000010110101111

As second step, I wanted to turn these binary strings to byte array. The problem is that the number of bits is not necessary suitable to do so: there may be some extra bits. The question is how can I do to convert the final output to a byte array and being able to deserialize it?

The tree is encoded as here described: Efficient way of storing Huffman tree

Community
  • 1
  • 1
Phate01
  • 2,499
  • 2
  • 30
  • 55

1 Answers1

0

Unless you can assure that all of your codes will be at least eight bits long, then you will need to either a) add an end symbol with its own code that terminates the message or b) add a few bits to the start of the message that says how many bits are valid in the last byte (assuming that there is some other source of information that says how many bytes the message occupies). Otherwise, the extra bits at the end could be misinterpreted as one or more additional symbols that are not intended.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • I thought to pad only the last bit if incomplete and encode, but the problem remains when decoding, with the '0' added to pad that invalidates the message. How to skip them? – Phate01 Oct 22 '15 at 13:31
  • What don't you understand about my answer? – Mark Adler Oct 22 '15 at 15:46
  • Your solution of 'a few bits at the beginning' is definetly not a good one because it limits the message length, so I understand everything, but it's not a solution of my problem, and I want some other people to brainstorm. – Phate01 Oct 23 '15 at 07:28
  • I gave two solutions. The other solution is suitable for streaming, and so would work with an arbitrary length stream. – Mark Adler Oct 23 '15 at 15:54