1

I am using python to do make a download manager to verify md5 checksums automatically... The thing is python gives a wrong md5, I cross-checked with a third party md5 verifier software.

I am using hashlib to check md5.. here's my piece of code for md5 checksum For some files, it gets md5 right but for others its just completely wrong...

x= sys.path[0]
x= x + '\\' + file_name
print 'file successfully saved to path', x
file_ref=open(x,'rb').read()
hashlib.md5(file_ref).hexdigest()
print 'MD5 of file is:',hashlib.md5(file_ref).hexdigest()

md5 for original file on website: e557fa76ed485fd10e8476377ad5be95

md5 given by python: cb3b2227733d3344dba15e5e39d04f43

md5 given by md5 verifier: e557fa76ed485fd10e8476377ad5be95

please help :/

scandalous
  • 912
  • 5
  • 14
  • 25
  • How big is a file? Can it be read in one `read()` call? – Rohan Feb 22 '13 at 08:52
  • `When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory.` – dmg Feb 22 '13 at 08:57
  • 1
    The file is a 8.9mb file...http://www.cccp-project.net/download.php?type=cccp – scandalous Feb 22 '13 at 08:59
  • 1
    Is it consistent in the sense that it always gets it right or wrong for a given file, or do results vary for the same file? – martineau Feb 22 '13 at 09:04
  • I can't reproduce the problem -- `hexdigest()` always returns the correct value. Why do you call it twice in your sample code? – martineau Feb 22 '13 at 09:13
  • hashlib works fine. you've got some other problem. – gps Feb 22 '13 at 18:32

1 Answers1

2

Reading that for some file it's right, but for others it's wrong, you can check your path. This is what I use for md5:

def hashsum(path, hex=True, hash_type=hashlib.md5):
    hashinst = hash_type()
    with open(path, 'rb') as f:
        for chunk in iter(lambda: f.read(hashinst.block_size * 128), b''):
            hashinst.update(chunk)
    return hashinst.hexdigest() if hex else hashinst.digest()

You can use this to compare:

myhash = hashsum(cfile)
sproc = subprocess.Popen(['md5', cfile], stdout=subprocess.PIPE)
syshash = sproc.communicate()[0].split()[0]
print myhash
print syshash
print 'Hash idetntical' if myhash == syshash else 'Hash check fail'

where cfile is the path to the file. I guess your path is wrong. I'm guessing windows so sys.path[0] is not the proper way to get the current directory.

dmg
  • 7,438
  • 2
  • 24
  • 33
  • Thanks for the reply, but its still giving me an incorrect md5: 22fb04afad00ccaeda1f5e5892493d77 – scandalous Feb 22 '13 at 08:53
  • To calculate the md5 sum, you can use this answer: http://stackoverflow.com/questions/1131220/get-md5-hash-of-big-files-in-python/40961519#40961519 – Laurent LAPORTE Dec 04 '16 at 17:47