Decoding a Huffman Tree from a String

Question

I was researching effective ways to store huffman trees in a file to allow a decoder to reconstruct the huffman tree and decode the file properly. Currently I've made a function to create my huffman encoding given frequencies.

I saw Efficient way of storing Huffman tree, and so I wrote a recursive function to mimic the recursive function in the answer of the post:

>>> huffman = [(b'A', '00'), (b'C', '01'), (b'E', '10'), (b'B', '110'), (b'D', '111')]
>>> write(hf)
b'001A1C01E01B1D'

In the post, the answer uses objects to contain node-leafs. However, I wanted to decode the string straight into a list of tuples like in the huffman variable above. I tried my hand at this, and came up with the following function, which kinda works:

def read(p):
    if p[0] == "1":
        x = p[1]
        p = p[2:]
        if len(p) == 0:
            c = [(x, "")]
        else:
            b = read(p)
            for a in range(len(b)):
                b[a] = (b[a][0], "1" + b[a][1])
            c = [(x, "0")] + b
        return c
    if p[0] == "0":
        c = read(p[1:])
        return c

The function returns this when called:

>>> read("001C1A01E01D1B")
[('C', '0'), ('A', '10'), ('E', '110'), ('D', '1110'), ('B', '1111')]

This is clearly not the same as the orginial dictionary huffman above.

How can you recursively decode a huffman tree dictionary, without using a node class, and get the original dictionary?

Your `write()` function does **not** work exactly as expected. i.e. `write([("A", 6), ("B", 1), ("C", 6), ("D", 2), ("E", 5)])` results in `if a[1][0] == "0":` -> `TypeError: 'int' object is not subscriptable` in Python 3, and `TypeError: 'int' object has no attribute '__getitem__'` in Python 2 (which are both essentially complaining about the same issue). — martineau, Mar 25 '17 at 19:04
Please [edit] your question and provide a MCVE (see [How to create a Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) so folks don't have to guess. — martineau, Mar 25 '17 at 19:16
@martineau also, it was a little rude to provide a link to edit the question in your comment. I am familiar with SO enough to know where I go to edit the question. — Mick Ashton, Mar 25 '17 at 19:24
Don't be so sensitive. With a rep of 43, why should I assume you have much SO experience. Also an MCVE _is_ appropriate here because you're basically asking "why doesn't this work". Pardon me if you know this—but one of the reasons for closing questions like this is "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the **shortest code necessary to reproduce it in the question itself.***" (emphasis mine) — martineau, Mar 25 '17 at 21:42
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/139054/discussion-between-mick-ashton-and-martineau). — Mick Ashton, Mar 25 '17 at 21:59

cdlane · Answer 1 · 2019-06-21T01:42:42.303

First, the string you pass to your read() function, '001C1A01E01D1B', is not the same as you show in your explanation above and is not the one used in the cited page, '001A1C01E01B1D'. Fixing this, your code produces better, though still incorrect, output:

[('A', '0'), ('C', '10'), ('E', '110'), ('B', '1110'), ('D', '1111')]

As far as the decoding goes, let recursion do the work for us:

class HuffmanSyntaxError(Exception):
    ''' Raised due to incorrect or extra encoded string elements '''

def read_recursive(string, code=''):
    if string[0] == "0":
        first, string = read_recursive(string[1:], code + "0")
        second, string = read_recursive(string, code + "1")

        return [*first, *second], string

    if string[0] == "1":
        return [(string[1], code)], string[2:]

    raise HuffmanSyntaxError(string)  # unknown option

def read(string):
    array, string = read_recursive(string)

    if string:  # left over input
        raise HuffmanSyntaxError(string)

    return array

print(read("001A1C01E01B1D"))

The recursive version of the routine returns the unparsed remainder of the string back as a second result. The front end function removes this, returning the desired result:

% python3 test.py
[('A', '00'), ('C', '01'), ('E', '10'), ('B', '110'), ('D', '111')]
%

Decoding a Huffman Tree from a String

1 Answers1