how to generate a binary file

Question

i am working on a school project and this is my question:

Generate a binary file that contains the table of encoding and the data of the file using the Huffman encoding.

First i need to read the data from a file and create a Huffman tree so i created it and it is all working, but i am not able to generate the binary file because the data i have are nodes not objects so i cannot put the data in the binary file and i am getting this error:

TypeError: a bytes-like object is required, not 'node'

q = {}
a_file = open("george.txt", 'r')
for line in a_file:
    key, value = line.split()

    q[key] = value


class node:
    def __init__(self, freq, symbol, left=None, right=None):
        self.freq = freq

        self.symbol = symbol

        self.left = left

        self.right = right

        self.huff = ''


def printNodes(node, val=''):
    newVal = val + str(node.huff)
    if(node.left):
        printNodes(node.left, newVal)
    if(node.right):
        printNodes(node.right, newVal)

    if(not node.left and not node.right):
        print(f"{node.symbol} -> {newVal}")


chars = ['a', 'b', 'c', 'd', 'e', 'f']

# frequency of characters
freq = [q['a'], q['b'], q['c'], q['d'], q['e'], q['f']]

nodes = []

for x in range(len(chars)):
    nodes.append(node(freq[x], chars[x]))

while len(nodes) > 1:
    nodes = sorted(nodes, key=lambda x: x.freq)

    left = nodes[0]
    right = nodes[1]
    left.huff = 0
    right.huff = 1
    newNode = node(left.freq+right.freq, left.symbol+right.symbol, left, right)
    nodes.remove(left)
    nodes.remove(right)
    nodes.append(newNode)

printNodes(nodes[0])
with open('binary.bin', 'wb') as f:
    f.write(nodes[0])

Huffman encoding is a compression algorithm. The output file should be smaller than the input file. I doubt that's the case with the given answer. — Thomas Weller, Jan 01 '22 at 22:23

David Parks · Accepted Answer · 2022-01-01T21:52:37.580

The process of converting structured objects to a binary form is called "serialization", so a search for "python serialization" is where you'd normally want to start. It's an integral part of most programming languages and comes in many forms. The defacto serialization method in python is called Pickle and is in the python package pickle.

Pickle lets you convert objects to a binary representation and vice versa, handling lots of little protocol details for you.

In your example you have:

with open('binary.bin', 'wb') as f:
    f.write(nodes[0])

You can serialize that to binary form like this:

import pickle

with open('binary.bin', 'wb') as f:
    b = pickle.dumps(nodes[0])  # bytes representation of your object
    f.write(b)                  # you can now write the bytes

You can also use shorthand methods such as the following to save all nodes in one line:

pickle.dump('binary.bin', nodes)

Deserialization looks similar:

with open('binary.bin', 'rb') as f:
    b = f.read()
    node0 = pickle.loads(b)

or

nodes = pickle.load('binary.bin')

Here are some related posts:

I don't think this is the answer to what OP needs. Yes, it gives a binary file, but no, it will not be a Huffman compression. Since pickle stores the Python type information in the file, the file will probably be larger than the original file and thus not be compressed. — Thomas Weller, Jan 01 '22 at 22:20
Point taken, I was not addressing the issue of huffman coding efficiency. If the OP confirms that understanding with a comment here I'll remove this answer to open it up for an answer that addresses that core issue. The OP can also uncheck the accepted answer so elicit other answers. — David Parks, Jan 01 '22 at 22:27

how to generate a binary file

1 Answers1