15

I have created hundreds of folders and text files using php, I then add them to a zip archive.

This all works fine but if I create another zip archive using the same folders and files, the new archive will have a different hash to the first one. This is the same if I use winrar instead of php to create an archive.

It only seems to show different hashes when I zip the files I have created through php, yet they open fine.

Very strange anyone shed any light on this?

Thanks

arbme
  • 4,831
  • 11
  • 44
  • 57
  • I'm guessing, maybe a different created timestamp which is part of the zip file ? – Orn Kristjansson Jul 22 '12 at 20:13
  • @orn The files are untouched, I can create 2 zips one after the other and it would be the same. – arbme Jul 22 '12 at 20:15
  • @arbme, no he's saying maybe there is a timestamp *in* the **created** zipfile. Since you didn't create them at the same time, they would be different. – Jonathon Reinhart Jul 22 '12 at 20:24
  • I thought timestamp of the file wasnt taken into count just the contents. It seems if you dont add the files in the same order you will get a different hash, even if the contents are the same. – arbme Jul 22 '12 at 20:30

3 Answers3

18

Zip is not deterministic. To solve this zip problem (it's really problem when you have CI and need to update AWS lambda, for example and don't want to update it each time, but only when something was really changed) I used this article: https://medium.com/@pat_wilson/building-deterministic-zip-files-with-built-in-commands-741275116a19
Like this:

find . -exec touch -t "$(git ls-files -z . | \
  xargs -0 -n1 -I{} -- git log -1 --date=format:"%Y%m%d%H%M" --format="%ad" '{}' | \
  sort -r | head -n 1)" '{}' +
zip -rq -D -X -9 -A --compression-method deflate dest.zip sources...
Dima Kurilo
  • 2,206
  • 1
  • 21
  • 27
8

There is certainly some difference in the files. If the lengths are not exactly the same, the hash will be different. You can use a comparing hex editor, like Hex Workshop for example, to see what exactly the differences are.

Possibilities that come to my mind:

  1. As @orn mentioned, there may be a timestamp in the zip format you are using (not sure).
  2. The order that the files are added to the archive may be different (depending on how you're selecting them / building the source array).
Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
  • 2
    That's wrong, zip will always be different unless forcing internal creation and modification time https://stackoverflow.com/questions/9714139/why-does-zipping-the-same-content-twice-gives-two-files-with-different-sha1 – Léo Germond Sep 26 '17 at 08:09
  • Tell me what specifically was wrong about my answer. – Jonathon Reinhart Sep 26 '17 at 22:11
0

You can consider using deterministic_zip it solves this issue, from its documentation:

There are three tricks to building a deterministic zip:

Files must be added to the zip in the same order. Directory iteration order may vary across machines, resulting in different zips. deterministic_zip sorts all files before adding them to the zip archive. Files in the zip must have consistent timestamps. If I share a directory to another machine, the timestamps of individual files may differ, despite having identical content. To achieve timestamp consistency, deterministic_zip sets the timestamp of all added files to 2019-01-01 00:00:00.

Files in the zip must have consistent permissions. File permissions look like -rw-r--r-- for a file that is readable by all users, and only writable by the user who owns the file. Similarly executable files might have permissions that look like: -rwxr-xr-x or -rwx------. deterministic_zip sets the permission of all files added to the archive to either -r--r--r--, or -r-xr-xr-x. The latter is only used when the user running deterministic_zip has execute access on the file.

Note: deterministic_zip does not modify nor update timestamps of any files it adds to archives. The techniques used above apply only to the copies of files within archives deterministic_zip creates.

Sheece Gardazi
  • 480
  • 6
  • 14