1

I have been doing a bit of programming in Python (still a n00b at it) and came across something odd. I made a small program to find the MD5 hash of a filename passed to it on the command line. I used a function I found here on SO. When I ran it against a file, I got a hash "58a...113". But when I ran Microsoft's FCIV or the md5sum.py in \Python26\Tools\Scripts\, I get a different hash, "591...ae6". The actual hashing part of the md5sum.py in Scripts is

m = md5.new()
while 1:
    data = fp.read(bufsize)
    if not data:
        break
    m.update(data)
out.write('%s %s\n' % (m.hexdigest(), filename))

This looks functionally identical to the code in the function given in the other answer... What am I missing? (This is my first time posting to stackoverflow, please let me know if I am doing it wrong.)

Community
  • 1
  • 1
Sam
  • 13
  • 2
  • 2
    Where is `fp` created? Are you opening it in ASCII mode instead of Binary? – FogleBird May 25 '10 at 17:28
  • Ah ha! That was it. I had not specified a mode parameter in the open() function in my program, so it was defaulting to text mode. I set the mode to 'rb', and now it's returning the correct hash. Thanks! – Sam May 25 '10 at 17:33

1 Answers1

8

Already resolved in comments, but in case anyone wants to give me points... ;)

Open your file in binary mode!

f = open(path, 'rb')
FogleBird
  • 74,300
  • 25
  • 125
  • 131