3

Git 2.2.0 and 2.2.1 seem to modify the timestamps of old .git/objects/pack/pack-*.pack files occasionally, for no good reason.

It just changes the timestamp; the contents are identical.

Debugging this is difficult as it seems to make changes only fairly rarely.

I have never seen anything like this in any Git version before 2.2.0. What is happening, and can I fix it somehow? Because of the useless timestamp updates I am getting suddenly large amounts of changes for incremental backups.

Jukka Suomela
  • 12,070
  • 6
  • 40
  • 46
  • 1
    why does it bother you? – user3159253 Dec 13 '14 at 05:24
  • @user3159253: As I said, lots of "new" data for backups. Slow and wastes disk space. – Jukka Suomela Dec 13 '14 at 11:55
  • 1
    Looks like it might be a patch series that went in to make sure adding a reference to an unreferenced object resets its gc clock. Any chance you can just tell your backup program to ignore the timestamps on those? They're immutable, the content can't change without the name changing. – jthill Dec 28 '14 at 01:23
  • @jthill could that patch be https://github.com/git/git/commit/c90f9e13abae630551ada3e895633bdc2cf4e080? – VonC Dec 28 '14 at 09:03
  • @VonC I'm thinking https://github.com/git/git/commit/d3038d22f91aad9620bd8e6fc43fc67c16219738 but a whole batch of reworks came in with https://github.com/git/git/commit/d70e331c0e8eaeb0bd75ae3020c3be71de075ff7 – jthill Dec 28 '14 at 09:05
  • Just to add some more general value, for anyone thinking I did anything impressive to find that patch series, I didn't. I did `git log --grep mtime --oneline`. Finding it didn't take two minutes. – jthill Dec 28 '14 at 09:20
  • @jthill: I guess https://github.com/git/git/commit/33d4221c79c89844bed6b9558cc2bc497251ef70 is the commit that introduced this specific feature (semi-random timestamp updates)? It explicitly calls `utime` to set the timestamp in situations in which older Git versions didn't do it. – Jukka Suomela Dec 28 '14 at 09:58
  • That sure looks good for it  Whichever, either selectively ignoring timestamps or using @VonC's bundle plan look like your two best options. I'd go for the bundles myself. I don't think those timestamp updates are going to go away, well-regarded backup systems that can deal with this situation properly aren't hard to come by. – jthill Dec 28 '14 at 11:46
  • Will `git gc` trigger this, or when pushing to remote? – xeor Jan 02 '15 at 11:29

2 Answers2

2

Another approach would be to not backup the git repo itself (with its packfiles), but to backup bundles:

  • first, you can create incremental bundle or a full bundle of your repo
  • second, once created, a bundle is one single file, very easy to backup/copy around (less error-prone than an rsync of multiple files, with potential date issue).
  • the process is easily scriptable (my script does incremental or full backup)
Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Nice. I steal, yes? Thanks. – jthill Dec 28 '14 at 09:03
  • @jthill please do: have fun with it. – VonC Dec 28 '14 at 09:15
  • Yes, I am aware of the possibility of doing backups with bundles (see http://stackoverflow.com/q/12129148/383299). However, in this question I am not looking for alternative backup strategies, but some way to avoid the timestamp updates. – Jukka Suomela Dec 28 '14 at 10:03
  • @JukkaSuomela no way to avoid the timestamp updates, I am afraid. I will leave this answer for others to use. – VonC Dec 28 '14 at 10:05
  • Oh, too bad. I guess I'll have to uninstall Git 2.2.x and use the old Git 1.9.x that comes with OS X. – Jukka Suomela Dec 28 '14 at 10:09
2

Git keeps more information on disk than absolutely necessary to record all information in the repository. The unnecessary information is kept to accelerate certain operations and/or avoid having to rewrite files. The algorithm to decide when to delete some of the unnecessary files uses modification time of the pack files as part of the decision process (see find_lru_pack). Therefore mtime is used by a cache-like mechanism in git. Modification time of pack files is changed in git without modifying the file (see freshen_file function) in order to aid the correct caching and avoiding evicting files likely to be used again.

If you modify freshen_file in sha1_file to a no-operation then mtimes should not be ever modified. This will however leave you open to potential data loss if there is a new commit being written with same data as before just as a garbage collection happens (thanks to comment below for pointing this out).

Bojan Nikolic
  • 1,296
  • 12
  • 8
  • 3
    That's not going to be safe. See the description in [the commit he found](https://github.com/git/git/commit/33d4221c79c89844bed6b9558cc2bc497251ef70), not touching the pack files can leave old objects referenced by new commits unprotected from gc. It's not just for performance. – jthill Jan 03 '15 at 21:15