Generating one MD5/SHA1 checksum of multiple files in Python

Question

I have looked through several topics about calculating checksums of files in Python but none of them answered the question about one sum from multiple files. I have several files in sub directories and would like to determine if there was any change in one or more of them. Is there a way to generate one sum from multiple files?

EDIT: This is the way I do it to get a list of sums:

checksums = [(fname, hashlib.md5(open(fname, 'rb').read()).digest()) for fname in flist]

Sure! Using [hashlib](https://docs.python.org/3/library/hashlib.html), simply call the hash object's `.update` method with the bytes of each file. But why bother? Simply hash each file separately, and see if any of the hashes have changed. That way, you also get the identity of which file(s) changed. But if you really want a multi-file hashing program, try writing it and if you get stuck **post your code** and I'll be happy to help. — PM 2Ring, Jan 15 '16 at 09:39
FWIW, [here](http://unix.stackexchange.com/a/163769/88378)'s some Python 2 code I wrote for U&L that does simultaneous MD5 and SHA-256 digests of a file. It process the file in blocks so it can handle files that are too big to fit in memory. — PM 2Ring, Jan 15 '16 at 09:39
Thank for your input! I put the code for multiple lines. I assume I can use `.update()` instead of `.digest()` but I am not sure how. Do you mean calc hash for the first file like this: `hash_obj = hashlib.md5(open(fname, 'rb').read())` and after that do `hash_obj.update(fname)`? Will it calc hash from file contents or just filename string? — Artur, Jan 19 '16 at 14:52
Yes, you need to use the `.update` method to supply extra data to the hashlib object. The `.digest ` and `.hexdigest` methods are simply output methods that give the digest of the data that's been fed so far to the hashlib object. I don't have time write now to go into further details or write any code. But I recommend that you _don't_ try to do this all in a one-line list comprehension: it might save a tiny bit of time but it makes the code hard to work with and hard to read. — PM 2Ring, Jan 19 '16 at 15:01

Timmmm · Answer 1 · 2022-11-23T15:42:03.230

Slightly cleaner than Artur's answer. There's no need to treat the first element specially.

Edit (2022): I know Python a bit better now so I updated the code as follows:

Use pathlib - it's more ergonomic and doesn't leave files open.
Add type hints. If you don't use these you're doing it wrong.
Avoid a very mild TOCTOU issue.

import hashlib
from pathlib import Path

def calculate_checksum(filenames: list[str]) -> bytes:
    hash = hashlib.md5()
    for fn in filenames:
        try:
            hash.update(Path(fn).read_bytes())
        except IsADirectoryError:
            pass
    return hash.digest()

(You can handle IsADirectoryError differently if you like.)

score 1 · Accepted Answer · answered Jan 21 '16 at 09:35

1

So I made it :) This way one hash sum is generated for a file list.

hash_obj = hashlib.md5(open(flist[0], 'rb').read())
for fname in flist[1:]:
    hash_obj.update(open(fname, 'rb').read())
checksum = hash_obj.digest()

Thank you PM 2Ring for your input!

Note that md5 has been cracked so use it only for non security critical purposes.

answered Jan 21 '16 at 09:35

Artur

973
1
14
29

Don't you need to close the files you're opening? – Martin Dec 18 '20 at 21:37
Fair point, probably the best way to do this would be to use `open` as a context manager. This is an old answer I wrote back when I was using Python 2 (and sure, this should include closing a file manually). – Artur Dec 21 '20 at 12:21

Safeer M · Answer 3 · 2022-04-13T05:06:52.933

-1

import subprocess
cmd =input("Enter the command : ")
trial = subprocess.run(["powershell","-Command",cmd])
#Powershell command : Get-FileHash -Algorithm MD5 -Path (Get-ChildItem "filepath\*.*" -Recurse -force)

edited Apr 13 '22 at 05:06

answered Apr 13 '22 at 05:05

Safeer M

7
2

Generating one MD5/SHA1 checksum of multiple files in Python

3 Answers3

Linked