42

Is there a way to create a .tar file that omits the values of atime/ctime/mtime for its files/directories?

Why do we want to do this?

We have a step in our build process that generates a directory of artifacts that gets packaged into a tarfile. We expect that build step to be idempotent -- given the same inputs, it produces exactly the same files/output each time.

Ideally, we would like also like the step to be bitwise idempotent across clean builds, so that we can use hashes of successive builds to check that nothing has changed. But because tar files include timestamps (atime/ctime/mtime) for each entry, the tar files created by that build step are never bitwise identical to the previous run, even though the contents of every file inside the archive are bitwise identical.

Is there a way to generate a tarfile that omits the timestamps of its entries, so that the step that generates the archive could be bitwise idempotent? (We want to leverage other file metadata that tar preserves, such as file mode bits and symlinks.)

Mickalot
  • 2,431
  • 3
  • 22
  • 23
  • did you ever find a complete answer to this? I also want to do the same thing, asked in a question here: https://stackoverflow.com/questions/45734702/tar-preserving-only-file-names-contents-and-executable-bit?noredirect=1#comment78427625_45734702 I also want to make sure the user, group and permissions are not stored. Is there anything else to be aware of? – Tom Ellis Aug 17 '17 at 12:56
  • @TomEllis, I would consider building something custom with the Python `tarfile` module if you want exact control of which permissions are and aren't stored. – Charles Duffy Nov 17 '17 at 15:49

2 Answers2

32

To have a truly idempotent tar, mtime is a good step but not enough. You also need to set the sort order, the owner and group (together with their mapping) and a proper timezone for mtime (since otherwise you're gonna have issues as well between Mac and Linux).

I ended up with

tar --sort=name --owner=root:0 --group=root:0 --mtime='UTC 2019-01-01' ... | gzip -n
Adracus
  • 891
  • 1
  • 7
  • 19
  • 1
    Note: this solution requires GNU tar 1.28 or higher. – Mickalot Feb 28 '19 at 20:10
  • Do you also need to enforce a well-defined sort order, perhaps by setting the env variable `LC_ALL=c`? – Mickalot Feb 28 '19 at 20:11
  • 1
    @Mickalot, you **explicitly** said in the question: *We want to leverage other file metadata that tar preserves*, asking answers to limit themselves to only discussing timestamps. Shifting the acceptance bit to an answer that isn't honoring that specification strikes me as moving the goalposts. – Charles Duffy Feb 28 '19 at 21:26
  • 7
    The question asked about bitwise idempotency so I think it is appropriate for answers to include anything contributing to that goal, including file ownership and sort order. – Jesse Glick Mar 02 '20 at 15:09
24

GNU tar has a --mtime argument, which can be used to store a fixed date in the archive rather than a file's actual mtime:

tar --mtime='1970-01-01' input ...

When compressing a tarball with gzip, it's also necessary to specify -n to prevent name and timestamp of the tar archive from being stored:

tar --mtime='1970-01-01' input ... | gzip -n >input.tar.gz
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • How can this be done in macOS? I cannot find `--mtime` – mljrg Sep 20 '17 at 16:17
  • 7
    @mljrg : the standard tar on macOS is bsd based, and BSD tar is different from GNU tar. To install GNU tar, if you use homebrew, you can `brew install gnu-tar` which makes the GNU tar available as `gtar`. – Mickalot Dec 11 '17 at 21:45
  • @Mickalot Thanks! – mljrg Dec 12 '17 at 11:50
  • `--mtime='@0'` is shorter and appears to function the same (tarballs made using both options matched MD5sums). – leetbacoon Aug 31 '23 at 14:53