5

I have looked through several topics about calculating checksums of files in Python but none of them answered the question about one sum from multiple files. I have several files in sub directories and would like to determine if there was any change in one or more of them. Is there a way to generate one sum from multiple files?

EDIT: This is the way I do it to get a list of sums:

checksums = [(fname, hashlib.md5(open(fname, 'rb').read()).digest()) for fname in flist]
Artur
  • 973
  • 1
  • 14
  • 29
  • 2
    Sure! Using [hashlib](https://docs.python.org/3/library/hashlib.html), simply call the hash object's `.update` method with the bytes of each file. But why bother? Simply hash each file separately, and see if any of the hashes have changed. That way, you also get the identity of which file(s) changed. But if you really want a multi-file hashing program, try writing it and if you get stuck **post your code** and I'll be happy to help. – PM 2Ring Jan 15 '16 at 09:39
  • FWIW, [here](http://unix.stackexchange.com/a/163769/88378)'s some Python 2 code I wrote for U&L that does simultaneous MD5 and SHA-256 digests of a file. It process the file in blocks so it can handle files that are too big to fit in memory. – PM 2Ring Jan 15 '16 at 09:39
  • Thank for your input! I put the code for multiple lines. I assume I can use `.update()` instead of `.digest()` but I am not sure how. Do you mean calc hash for the first file like this: `hash_obj = hashlib.md5(open(fname, 'rb').read())` and after that do `hash_obj.update(fname)`? Will it calc hash from file contents or just filename string? – Artur Jan 19 '16 at 14:52
  • Yes, you need to use the `.update` method to supply extra data to the hashlib object. The `.digest ` and `.hexdigest` methods are simply output methods that give the digest of the data that's been fed so far to the hashlib object. I don't have time write now to go into further details or write any code. But I recommend that you _don't_ try to do this all in a one-line list comprehension: it might save a tiny bit of time but it makes the code hard to work with and hard to read. – PM 2Ring Jan 19 '16 at 15:01

3 Answers3

8

Slightly cleaner than Artur's answer. There's no need to treat the first element specially.

Edit (2022): I know Python a bit better now so I updated the code as follows:

  • Use pathlib - it's more ergonomic and doesn't leave files open.
  • Add type hints. If you don't use these you're doing it wrong.
  • Avoid a very mild TOCTOU issue.
import hashlib
from pathlib import Path

def calculate_checksum(filenames: list[str]) -> bytes:
    hash = hashlib.md5()
    for fn in filenames:
        try:
            hash.update(Path(fn).read_bytes())
        except IsADirectoryError:
            pass
    return hash.digest()

(You can handle IsADirectoryError differently if you like.)

Timmmm
  • 88,195
  • 71
  • 364
  • 509
1

So I made it :) This way one hash sum is generated for a file list.

hash_obj = hashlib.md5(open(flist[0], 'rb').read())
for fname in flist[1:]:
    hash_obj.update(open(fname, 'rb').read())
checksum = hash_obj.digest()

Thank you PM 2Ring for your input!

Note that md5 has been cracked so use it only for non security critical purposes.

Artur
  • 973
  • 1
  • 14
  • 29
  • Don't you need to close the files you're opening? – Martin Dec 18 '20 at 21:37
  • Fair point, probably the best way to do this would be to use `open` as a context manager. This is an old answer I wrote back when I was using Python 2 (and sure, this should include closing a file manually). – Artur Dec 21 '20 at 12:21
-1
import subprocess
cmd =input("Enter the command : ")
trial = subprocess.run(["powershell","-Command",cmd])
#Powershell command : Get-FileHash -Algorithm MD5 -Path (Get-ChildItem "filepath\*.*" -Recurse -force)
Safeer M
  • 7
  • 2