Compressing a file using Huffman coding

Question

While Compressing a file Using Huffmann coding, After assigning Huffmann codes to each character in a file, these characters should be replaced with equivalent Huffmann codes in the compressed file. Then how the equivalent characters gets extracted with those Huffman codes from the compressed files while decompressing the file. Do the compressed file contains some extra information to decode the Huffmann codes?

In the typical implementation of this, you store the tree of encoded symbols before the compressed data, this way the decompression first reads and reconstructs the tree of symbols, then uses that to decompress the data that follows. See if my [answer to a different Huffman question](https://stackoverflow.com/questions/759707/efficient-way-of-storing-huffman-tree/759766#759766) helps you out, if not, please feel free to expand on what exactly it is you're asking about. — Lasse V. Karlsen, Dec 01 '18 at 22:28
Thanks a lot. After having looked at the above mentioned answer, I am left with one question how I can generate the huffman codes using the tree(001A1C01E01B1D)? — Anish, Dec 03 '18 at 19:15
You read those bits in and regenerate the tree. To decode the compressed stream you start at the root, then read the next bit, if it's a 0 go left, if it's a 1 go right, and when you reach a leaf node, output the symbol in the leaf node then reset back to the root and keep reading bits. Can you clarify exactly what your question is? — Lasse V. Karlsen, Dec 03 '18 at 19:30
My question was how I can regenerate the huffman codes assigned for characters(using the tree for example 001A1C01E01B1D). However this part of the above mentioned answer explained it all. To read, do this: Read bit. If 1, then read N-bit character/byte, return new node around it with no children If bit was 0, decode left and right child-nodes the same way, and return new node around them with those children, but no value . Thanks a lot again. — Anish, Dec 04 '18 at 20:28
The above stated method works for .txt files but I am not able to compress the PDF files. I need help. — Anish, Dec 18 '18 at 21:55
PDF files are usually already compressed, if they are then huffman will not be able to improve on that. — Lasse V. Karlsen, Dec 19 '18 at 08:24
@LasseVågsætherKarlsen After generating the Huffman codes and writing them along with the tree in compressed file, the size of compressed file exceeds the size of original file. And I think it is right because the computer doesn't recognize the Huffman codes as bits instead it interprets those 0s and 1s as ascii characters. Am I right? If yes then how Huffman coding is used for compression? — Anish, Dec 21 '18 at 19:29
You have to write the bits out as bits, not as characters. Since files only deals with bytes, you have to buffer the bits until you get 8 of them and form a byte that you then write to the file. — Lasse V. Karlsen, Dec 22 '18 at 12:48
If you want I can hack together a short C# program that does compression and decompression, so that you can see how it all fits together. I will not be using the canonical huffman code that Mark Adler is linking to in his answer, which is likely better but I'm afraid I have no experience with it and huffman coding isn't something I use these days so it wouldn't be something I want to spend time learning. — Lasse V. Karlsen, Dec 22 '18 at 14:17
Here's a very naive implementation - https://github.com/lassevk/SO53575559-HuffmanCoding - it focuses on showing one way of implementing it, rather than an *efficient* way of implementing it. Error checking is also missing. — Lasse V. Karlsen, Dec 22 '18 at 17:07
@LasseVågsætherKarlsen thanks I got the idea but i got some doubts too. 1. While writing the Huffman codes in the compressed file in bundles of 8(bytes) What if I am not left with exactly 8 bits at the end of the file like for example 00000111111101100. For first 8 bits 00000111, I will write the 7th ASCII character, for the next 8 bits 11110110, I will write the 246th ASCII character. And now I am left with the only one bit 0. How to deal with it?What should I write next in the file ?Also thanks for this much guidence I am a student and also not familiar with C#. — Anish, Dec 23 '18 at 21:40
You need to keep track of how many bytes you are going to encode, so that when decoding you stop when you have decoded that many bytes. The leftover bits can be anything you want, just keep them at 0. If you stop at the right place, you won't be using them. — Lasse V. Karlsen, Dec 24 '18 at 13:20
My naive implementation above has moved to https://github.com/lvk-stackoverflow/SO53575559-HuffmanCoding — Lasse V. Karlsen, Jun 19 '20 at 13:24

score 0 · Accepted Answer · answered Dec 02 '18 at 03:30

Yes. You need to send a description of the Huffman code in order to decode them.

The usual implementation is to encode using a canonical Huffman code, and then sending just the lengths for each symbol. The description of the code can itself be compressed.

Compressing a file using Huffman coding

1 Answers1