I'm using the hashlib
module to test a hypothesis about hash algorithms and I'm getting strange results. I check my results with the Windows fciv
program. The workflow I'm using is this:
- Gather the file and algorithm from the user.
- Print out the original filename and hashed file using that algorithm.
- Test the results with
fciv
in Windows. - Add a few bytes or a space character to the file.
- Print out he new hashed file using the chosen algorithm.
- Test the results with the updated file in
fciv
.
The problem is this:
When I use a .txt
file, I am getting the different results as I expected from my program and from fciv
. This works perfectly.
Here is the output:
Original Filename: example_docs\testDocument.txt
Original md5 Hash: 62bef8046d4bcbdc46ac81f5e4202fe7
Updated md5 Hash: 78a96b792cf2ea160db5e4823f4bf0c5
However, when I use an .mp4
video file, fciv
shows a different hash, but my program does not.
Here is the output:
Original Filename: example_docs\testVideo.mp4
Original md5 Hash: 9a7dcb986e2e756dda60e851a0b03916
Updated md5 Hash: 9a7dcb986e2e756dda60e851a0b03916
It doesn't matter how many times I run my program, the hash remains the same in the output from my program, but fciv
displays different results.
Here is my code snippet:
def getHash(filename, algorithm):
h = hashlib.new(algorithm)
h.update(filename)
return h.hexdigest()
print "Original Filename: {file}".format(file=args.file)
with open(args.file, "a+") as inFile:
h = getHash(inFile.read(), args.algorithm)
print "Original {hashname} Hash: {hashed_file}".format(hashname=args.algorithm, hashed_file=h)
with open(args.file, "a+") as inFile:
inFile.write(b'\x07\x08\x07') # Also worked with inFile.write(" ")
with open(args.file, "a+") as inFile:
h = getHash(inFile.read(), args.algorithm)
print "Updated {hashname} Hash: {hashed_file}".format(hashname=args.algorithm, hashed_file=h)
where args.algorithm
is md5
and args.file
is the user-provided filename.