0

I'm using the hashlib module to test a hypothesis about hash algorithms and I'm getting strange results. I check my results with the Windows fciv program. The workflow I'm using is this:

  1. Gather the file and algorithm from the user.
  2. Print out the original filename and hashed file using that algorithm.
  3. Test the results with fciv in Windows.
  4. Add a few bytes or a space character to the file.
  5. Print out he new hashed file using the chosen algorithm.
  6. Test the results with the updated file in fciv.

The problem is this:

When I use a .txt file, I am getting the different results as I expected from my program and from fciv. This works perfectly.

Here is the output:

Original Filename: example_docs\testDocument.txt
Original md5 Hash: 62bef8046d4bcbdc46ac81f5e4202fe7
Updated md5 Hash: 78a96b792cf2ea160db5e4823f4bf0c5

However, when I use an .mp4 video file, fciv shows a different hash, but my program does not.

Here is the output:

Original Filename: example_docs\testVideo.mp4
Original md5 Hash: 9a7dcb986e2e756dda60e851a0b03916
Updated md5 Hash: 9a7dcb986e2e756dda60e851a0b03916

It doesn't matter how many times I run my program, the hash remains the same in the output from my program, but fciv displays different results.

Here is my code snippet:

def getHash(filename, algorithm):
    h = hashlib.new(algorithm)
    h.update(filename)
    return h.hexdigest()

print "Original Filename: {file}".format(file=args.file)
with open(args.file, "a+") as inFile:
    h = getHash(inFile.read(), args.algorithm)
    print "Original {hashname} Hash: {hashed_file}".format(hashname=args.algorithm, hashed_file=h)              

with open(args.file, "a+") as inFile:               
    inFile.write(b'\x07\x08\x07') # Also worked with inFile.write(" ")

with open(args.file, "a+") as inFile:
    h = getHash(inFile.read(), args.algorithm)
        print "Updated {hashname} Hash: {hashed_file}".format(hashname=args.algorithm, hashed_file=h)

where args.algorithm is md5 and args.file is the user-provided filename.

Blairg23
  • 11,334
  • 6
  • 72
  • 72

1 Answers1

1

Open your files always in binary mode with ab+. Otherwise Python on Windows will use text mode for what it thinks are text files.

But I do wonder why you would be using ab+ rather than rb+ if you intend to read the entire file as with ab+ the file pointer starts out at the end where as with rb+ it starts out at the beginning of the file.

See https://stackoverflow.com/a/23566951 for a nice list of the file modes.

Community
  • 1
  • 1
Dan D.
  • 73,243
  • 15
  • 104
  • 123
  • I used `ab+` and `rb+` with the same results. I wanted to be able to write a few bytes to the end of a file, hence the "append" mode usage. It still doesn't explain why one program says the hash has changed, but the other says it hasn't. Can you explain it? How would one program see a different hash every time, but the one I wrote only sees one hash EVERY time? – Blairg23 Dec 11 '14 at 18:36
  • I retract my previous statement. I thought I had used `ab+` and `rb+` with the same results. However, I think I must've used `r+` and `a+` only. Using `a+`, I was still able to append my binary to the file, but when I read it into my hashing algorithm, it didn't read a difference. When I used `ab+` to read in the file, I see a completely different hash, as I should. Thanks for the sanity check Dan-D. ! – Blairg23 Dec 12 '14 at 03:13