-1

I'm trying to build a Huffman encoder that can read the bytes out of a text file and convert them to Huffman codes. And I need to be able to decode.

So far I've been able to read the bytes from a file and put them in an tree that is based on a double-linked list representing the Huffman encoding. I have a text file with this text: "abcabcabd". This is the tree I generate: enter image description here

My goal is to read out the tree and put the whole output in an array. My expected array would look like this: "1, 01, 000, 1, 01, 000, 1, 01, 001"

The only problem is I have no idea how to determine where what letter is positioned. I want to do it using the 1/0 method.

So my question here is: how do I determine where the 'byte' codes are located. Would I do it in a while loop? I want it to be variable so creating an library isn't possible. I have tried this:

First I create my tree:

public static void mBitTree()
    {
        W = H;
        if (W.N != null)
        {
            int Fn = W.A;
            int Sn = W.N.A;
            int Cn = Fn + Sn;

            cZip n = new cZip(0, Cn);

            //First Set new node to link to subnodes.
            n.L = W;
            n.R = W.N;

            if (W.N.N != null)
            {
                n.N = H.N.N;
                H.N.N.P = n;
                H.N.N = n;
                //Safe and set new head and fix links.
                W = H.N.N;
                H.N.N = null;
                H.N.P = null;
                H.N = null;
                H = W;
            }
            //this means there were 2 nodes left. so the newly created one will become Head ant Tail and the tree is complete.
            else
            {
                H.N.P = null;
                H.N = null;
                H = n;
                T = n;
            }
        } 
        else
        {
            return;
        }
    }

The top node from where I start is H and T. Also it has an L and R. The first thing needed to be checked is H.L is not equal to current. Then go to H.R.L, then H.R.R.L and so on. If I have a bigger text file this needs to work as well. So I created something like this:

public static string[] mCodedchain(uint[] count, byte[] bytesInFile)
    {
        int counter = 0;                                                                
        for (int i = 0; i < count.Length; i++) { if (count[i] != 0) counter = counter + (int)count[i]; }
        string[] codedArray = new string[counter];


        for (int i = 0; i < bytesInFile.Length; i++)
        {
            int number = 0;
            while ((int)bytesInFile[i] != number )
            {
                number = H.L.V //and if not H.L Add R in between to go to H.R.L.V
            }               
        }            

        return codedArray;
    }

But have no clue as to how I would build up the while loop. I've tried a lot of things but this one seems the most valid to use, but I can't get it to work.

I'm not quite sure if this question is clear. But I hope it is.

Thanks in advance, and happy coding.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
B. Dionys
  • 916
  • 7
  • 34
  • 1
    See my answer here: https://stackoverflow.com/questions/759707/efficient-way-of-storing-huffman-tree/759766#759766 – Lasse V. Karlsen Jan 14 '18 at 19:26
  • @LasseVågsætherKarlsen I already found that but it isnt something I can use is it? It does explain how the encodement works etc if i instantly write them to bits. and read them out as those. but im trying to put the paths into an string array. So correct me if im wrong but then the code examples in your answer there arent ussefull in this situation right? – B. Dionys Jan 14 '18 at 19:44
  • 1
    OK, then I was confused by the title of your question. – Lasse V. Karlsen Jan 14 '18 at 21:10
  • 1
    When you start reading bits from the stream, start at the root node and for each 0/1 you go right or left respectively. When you hit a node that contains a letter, output the letter and go back to the root, it is that simple. – Lasse V. Karlsen Jan 14 '18 at 21:12
  • @LasseVågsætherKarlsen Ah I understand now thank you. really I hadnt even thought about that but now u say that again. u are completely right. – B. Dionys Jan 14 '18 at 21:37
  • @LasseVågsætherKarlsen if u add this to an answer I can close this question becouse this actually solved my problem i tried it just now and this works. – B. Dionys Jan 14 '18 at 21:41

1 Answers1

1

Decoding static Huffman data is actually quite simple.

What you need:

  • The node tree with symbols (letters/bytes in your case) attached in the same configuration as were used during encoding
  • A way to read bits from the stream, one bit at a time

With that in place, here's pseudocode for doing the decoding:

<node> ← <root>
WHILE MORE
    <bit> ← <ReadBit()>
    IF <bit> IS 1 THEN
        <node> ← <node.LEFT>
    ELSE
        <node> ← <node.RIGHT>
    END IF
    IF <node.SYMBOL> THEN
        OUTPUT <node.SYMBOL> AS DECODED SYMBOL
        <node> ← <root>
    END IF
END WHILE

Or in plain English:

Start at the root node.
Read 1 bit from the stream. If the bit is 1, go to the left child of the current node, otherwise go to the right child.

Whenever you hit a node with a symbol in it, output the symbol and go back to the root node Go back to "read 1 bit" and keep going until you've decoded the entire stream

NOTE! You need to know how many symbols to output separately. The reason for this is that the very last byte in the encoded stream might have extra bits, and if you decode these as well you might end up with extra symbols, compared to the original data before encoding.

Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825