I've been searching the Internet but couldn't find what I need.
I have to compress big files using the Huffman coding. My idea was to read the first 1-2MB of the file
(to avoid first reading the whole file to build the tree, and then reading it once more to encode it, avoiding O(2n) ),
and build the Huffman tree. If any of the 256 alphabet byte was missing, I'd add it by myself, in case it appears later in the file(and not in the first 1-2 MBs). But trying to test the result using this:
int * totalFr = new int[256];
unsigned char * symArr= new unsigned char[256];
for (int i = 0; i < 256; i++)
{
totalFr[i] = i;
symArr[i] = unsigned char(i);
}
int size = sizeof(symArr) / sizeof(symArr[0]);
buildHuffmanTree(totalFr,symArr, size );
delete[] totalFr;
delete[] arrei;
where buildHuffmanTree
is a function, which builds the Huffman tree, made my realise the best character code I could get was 7 bits, for example 0000001
.
And this is where my question came from - is it worth it to build Huffman Tree for a full 256 words alphabet? Or is it better to use adaptive Huffman Coding for chunks like 1-2MB