I'm working on a project where I identify all the unique occurrences of fixed size blocks within a binary file and save then save the result to a binary file (it needs to work across multiple languages).
My approach is the following: I read each block of the file, hash, and store the unique hashes and binary code to a dictionary. Each time the program sees a repeated hash, it appends the position for later reconstruction. An examples of the resulting dictionary is represented below:
dict = {'d59fce39b5d8d4b278acbf2f5be0353c': [b'\xc5\xd7\x14\x84', 0, 1, 4],
'bf937a85a0f950f431a4c9c1aeca8686': [b'\x08\xe7\x07\x8f', 2, 3, 5]}
Then, I'm using with open('out.data, 'wb') as f:
to do save the file to disk (f.write(dict)
), but I get the following error:
TypeError: a bytes-like object is required, not 'dict'
Other solutions I found here didn't help me. I tried passing the dictionary to a JSON object, as suggested here, but got:
new_dict = json.dumps(dict)
TypeError: Object of type 'bytes' is not JSON serializable
I'm working with arbitrary bytes, thus, encoding does not seem like a solution for this issue.