That question has already been asked and answered a number of times on this site but for some vague reason no one came up with a relatively simple (in my opinion), more concise and probably more elegant solution. Perhaps because the solution is actually bad, but that's what I'm trying to figure out, if it's bad then I'd like to know how and why. One of the most popular answers was this:
def md5(fname):
hash_md5 = hashlib.md5()
with open(fname, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash_md5.update(chunk)
return hash_md5.hexdigest()
It's understandable - we don't want to load the whole file into memory so we read it in chunks with the help of an iterator and lambda function. Nice and simple. But presumably we could do this in a simplier way by defining the md5sum function as follows:
def md5sum(fname):
md5 = hashlib.md5()
with open(fname, 'rb') as f:
for chunk in f:
md5.update(chunk)
return md5.hexdigest()
Conveniently, iterating over an open file handle gives us a sequence of its lines, so we could use the 'b' prefix in open(fname, 'rb')
to iterate over a bytes object. What's wrong about doing that?