How do you create a tarball so that its md5 or sha512 hash will be deterministic?
I'm currently creating a tarball of a directory of source code files by running tar --exclude-vcs --create --verbose --dereference --gzip --file mycode.tgz *
, and I'd like to record its hash so I can use it as a fingerprint for detecting changes in the future.
However, I've noticed that if I create duplicate tarballs without changing any files, running the Python hashlib.sha512(open('mycode.tgz').read()).hexdigest()
on each archive returns a different hash.
Is this because tar's compression algorithm is not deterministic? If so, how can I efficiently archive a large collection of files in such a way that I can calculate a consistent hash to detect changes?