My current approach is this:
def get_hash(path=PATH, hash_type='md5'):
func = getattr(hashlib, hash_type)()
with open(path, 'rb') as f:
for block in iter(lambda: f.read(1024*func.block_size, b''):
func.update(block)
return func.hexdigest()
It takes about 3.5 seconds to calculate the md5sum of a 842MB iso file on an i5 @ 1.7 GHz. I have tried different methods of reading the file, but all of them yield slower results. Is there, perhaps, a faster solution?
EDIT: I replaced 2**16
(inside the f.read()
) with 1024*func.block_size
, since the default block_size
for most hashing functions supported by hashlib is 64
(except for 'sha384' and 'sha512' - for them, the default block_size
is 128
). Therefore, the block size is still the same (65536 bits).
EDIT(2): I did something wrong. It takes 8.4 seconds instead of 3.5. :(
EDIT(3): Apparently Windows was using the disk at +80% when I ran the function again. It really takes 3.5 seconds. Phew.
Another solution (~-0.5 sec, slightly faster) is to use os.open():
def get_hash(path=PATH, hash_type='md5'):
func = getattr(hashlib, hash_type)()
f = os.open(path, (os.O_RDWR | os.O_BINARY))
for block in iter(lambda: os.read(f, 2048*func.block_size), b''):
func.update(block)
os.close(f)
return func.hexdigest()
Note that these results are not final.