13

When I zip (Zip 2.31) the same file in Linux I get a different checksum everytime. How can I keep the same md5sum from last time? I'm using the latest zip update from yum

Marvado
  • 331
  • 1
  • 3
  • 11
  • The most likely reason is that the file you're compressing keeps changing. – NPE Oct 22 '13 at 16:15
  • the file is the same, same creation date, same size, same checksum – Marvado Oct 22 '13 at 16:19
  • My advice: (1) Ask on a site where the question is on-topic (e.g. http://superuser.com/). (2) Include a complete, reproducible shell session that demonstrates the behaviour. – NPE Oct 22 '13 at 16:27

4 Answers4

28

The archive being generated does not only contain the compressed file data, but also "extra file attributes" (as refered in zip documentation), as file timestamps, file attributes, ...

If this metadata is different between compressions, you will never get the same checksum, as the metadata for the compresed file has changed and has been included in the archive.

You can use zip's -X option (or the long --no-extra option) to avoid including the files extra attributes in the archive:

zip -X foo.zip foo-file

Sucessive runs of this command without file modifications must not change the hash of the archive.

MC ND
  • 69,615
  • 8
  • 84
  • 126
  • source file has the same checksum everytime. It's weird – Marvado Oct 22 '13 at 16:20
  • Yes, but when you add the file into the zip, you add the metadata (file modificacion datetime) to the zip. So, the zip is different, so are the chechsums – MC ND Oct 22 '13 at 16:40
  • How can I prevent that from happening – Marvado Oct 22 '13 at 16:40
  • 4
    zip command has a `--no-extra` parameter to control file atributes. I don't have now a copy to try. If this doesn't work, you can try to use `touch` command to set the file date/time before zipping. – MC ND Oct 22 '13 at 19:40
  • 2
    thanks MC, using the -X flag works: -X eXclude eXtra file attributes. Thanks for the tip that led em to this. – Marvado Oct 23 '13 at 15:54
  • 1
    the -X flag doesn't work for me on OSX. Probably because it doesn't save extended file attributes which I guess are distinct from modification time. – Lammey Dec 05 '19 at 15:07
  • For OSX at least adding -u might be sometimes (when the previous zip file is available) helpful: -u update: only changed or new files – patrungel Sep 10 '20 at 13:50
3

Adding -X flag as suggested in @mc-nd's answer worked fine for me on single-file zip.

But when I was compressing a directory (node_modules in my case) I was getting the different hash each time I reinstalled node_modules.

The fix was to also add -D flag:

-D
   --no-dir-entries
          Do  not  create entries in the zip archive for directories.  
          Directory entries are created by default so that their attributes can
          be saved in the zip archive.
RomanHotsiy
  • 4,978
  • 1
  • 25
  • 36
  • Which OS are you running on? On both macOS and Debian-flavoured Linux, if I use the long option `--no-extra` `I get zip error: Invalid command arguments (long option 'no-extra' not supported)` and the short option `-X` doesn't appear to do anything (if I extract the file again it has the timestamp of the original file)... – VirtualWolf Feb 16 '22 at 04:10
3

Neither -X or -D worked for me. It looks like zip still sets timestamps within the archive causing mismatching hashes on identical content.

I've fixed the issue by manually setting file timestamps using:

touch -t 202001010000 file
Valer
  • 844
  • 1
  • 7
  • 17
3

In order to make a deterministic archive, one that can be rebuilt and verified using a hash, several things are required:

Timestamps of all files must have predictable values

Set the timestamps of all files to a specific value, e.g.

find . -exec touch -d '1985-10-21 09:00:00' {} \;

As an aside, the earliest date supported by the zip format is 01/01/1980 - timestamping all files to the unix epoch (01/01/1970) won't have the desired effect.

If making a zip from a Git checkout you could use the Git commit timestamp of the last change to each file (inspired by this stackoverflow answer).

git ls-files | xargs -I {} sh -c 'chmod 644 "{}"; touch -m -t "$(git log --pretty=format:%cd -n 1 --date=iso "{}" | sed "s/-//g;s/ //;s/://;s/:/\./;s/ .*//")" "{}"'

Permissions of all files must have predictable values

Explicitly set permissions, say to 644, like this:

find . -type f -exec chmod 644 {} \;

Don't rely on the permissions applied by git clone because these depend on the environment's uname value and are therefore unpredictable.

Present files to zip in a specific order

The order in which files are added to a zip matters. Instead of relying on recursion and globbing that depend on the order files are stored in directories which is filesystem dependent and unpredictable. Use somthing like find and sort the list to provide a predictable order.

Disable the zip "extra attributes" feature

This ensures that non-deterministic data such as archive modification timestamps, user names, etc, is not written to the archive. Use the -X option to do this.

Example:

find . -type f | sort | TZ=UTC zip -qX myfile.zip -@ 

Also, here, the timezone is forced to UTC to avoid further confusion.

Such a zip should be deterministic; verifable using md5sum, sha256sum, etc.

starfry
  • 9,273
  • 7
  • 66
  • 96