I think it will be sufficient to use the python dictionary structure to represent trees and nodes. You don't really need a separate class for it.
You want to initialize all your nodes:
def huffman_tree(freq_dict):
vals = freq_dict.copy()
nodes = {}
for n in vals.keys():
nodes[n] = []
Here we have initialized a dictionary nodes
to represent nodes and leafs. Let's populate it with data; within the same function:
while len(vals) > 1:
s_vals = sorted(vals.items(), key=lambda x:x[1])
a1 = s_vals[0][0]
a2 = s_vals[1][0]
vals[a1+a2] = vals.pop(a1) + vals.pop(a2)
nodes[a1+a2] = [a1, a2]
You see that I first now sort the data from our frequency-dictionary, ascending. You did it earlier, so you shouldn't have to do that now (although you did it descending). Though, doing it later rather than sooner, and within such a while-loop, lets you be more freely with which frequency-dictionary you pass to your program. What we're doing here furthermore, is that we're taking two and two items from the freq_dict when sorted, adding them together and storing them in the freq_dict.
Now we need to go through our freq_dict, and construct some sort of symbols-dictionary, representing a rules-set for exchanging text with symbols. Still within the same function:
symbols = {} # this will keep our encoding-rules
root = a1+a2 # a1 and a2 is our last visited data,
# therefore the two largest values
tree = label_nodes(nodes, root, symbols)
return symbols, tree
The line with tree = ...
might seem like a bit of magic here, but that's because we haven't created the function yet. But imagine there's a function that recursively go through each node from the root to the leafs, adding a string-prefix of either '0' or '1' representing the encoded symbol (and that's why we sorted ascending, so that we got the most frequent word at the top, receiving the smallest encoded symbol):
def label_nodes(nodes, label, symbols, prefix = ''):
child = nodes[label]
tree = {}
if len(child) == 2:
tree['0'] = label_nodes(nodes, child[0], symbols, prefix+'0')
tree['1'] = label_nodes(nodes, child[1], symbols, prefix+'1')
return tree
else: # leaf
symbols[label] = prefix
return label
This function does just that. And now we're ready to use it:
def huffman_encode(string, symbols):
return ''.join([symbols[str(e)] for e in string])
text = '''This is a simple text, made to illustrate how
a huff-man encoder works. A huff-man encoder works
best when the text is of reasonable length and has
repeating patterns in its language.'''
fd = freq_dict(text)
symbols, tree = huffman_tree(fd)
huffe = huffman_encode(text, symbols)
print(huffe)
Out: 001001011001110100011011101000110110100100111101011000110011010010000011011000110000110110010011011100011010111010000011011110110111010100101001011101100111011111001010101100001110011101111110010011101010101010101011010011100111101100101001011111100110001101010000100010001111101110111110100001110001111100110111110011111100011111111101110000001110011110110010100101111110011000110101000010001000111110111011111010000111000111110011011111001111110001110011101010101010101010010000000011101101111100110010001000011011110010000110110001100001101101110100011011101100101011110000010100011110111000101000100010010000011001000010001111011011110010110101000111010011100110100011100111010101010101010111100000100110000101010111101010001111010110011010101011101100011100100000110111010100001110101011001101100101010100011110111101110101111010001111111
Decoding is a simple matter of traversing the tree:
def huffman_decode(encoded, tree, string=False):
decoded = []
i = 0
while i < len(encoded):
sym = encoded[i]
label = tree[sym]
# Continue untill leaf is reached
while not isinstance(label, str):
i += 1
sym = encoded[i]
label = label[sym]
decoded.append(label)
i += 1
if string == True:
return ''.join([e for e in decoded])
return decoded
print(huffman_decode(huffe, tree, string=True))
Out: This is a simple text, made to illustrate how
a huff-man encoder works. A huff-man encoder works
best when the text is of reasonable length and has
repeating patterns in its language.
This answer is in large parts stolen from my own GitHub: https://github.com/RoyMN/Python/tree/master/tools/text_handler