19

I've seen git gc --aggressive --prune and git repack -a -d --depth=250 --window=250 recommended for reducing the size of your local .git folders where a long local history is not needed. From my reading it seems git-repack is preferred, can anyone comment on this?

What I really want to know is how to decide on values for depth and window. I use git to commit, push, pull and merge, I have no idea what a delta chain or object window is.

Jake
  • 12,713
  • 18
  • 66
  • 96
  • 1
    `git gc` should be sufficient and is the easy way – CharlesB Feb 12 '13 at 21:27
  • 2
    For reference, here is a synopsis of the email thread from Linus Torvalds that explains the reasoning for using git repack over git gc http://metalinguist.wordpress.com/2007/12/06/the-woes-of-git-gc-aggressive-and-how-git-deltas-work/ – spuder Jan 07 '14 at 16:23

2 Answers2

24

I ran some tests with different values. This is too large to be a comment on twalbergs answer.

My company has a code base that has been in svn, mercurial, and now git. It is 10 years old, with 21,000 commits.

Before the pack it was 3.1 GB. After the repack, it shrunk to the following values:
(running the repack on a fresh clone of the 3.1GB folder each time).

git repack -a -d --depth=50 --window=10 -f
141.584 MB

git repack -a -d --depth=250 --window=1000 -f
110.484 MB

git repack -a -d --depth=500 --window=1000 -f
110.204 MB

They took about 5, 15 and 30 minutes respectively on my quad core mac.


Update:

I took the second repack (250,1000) and reran the repack with 500, and 1000 to see if there is any difference between a fresh 3.1gb repo and an already repacked 110mb repo.

git repack -a -d --depth=250 --window=1000 -f
110.484 MB
git repack -a -d --depth=500 --window=1000 -f
110.212 MB

Verdict: the repack 500, 1000 resulted in a 110.2 MB file regardless if it had already been packed or not.

Update2:

I was further curious if running a repack with lower values on an already repacked repo would cause the size to increase.

git repack -a -d --depth=500 --window=1000 -f
110.204 MB
git repack -a -d --depth=50 --window=10 -f  
142.056 MB

Verdict: the repack caused the repo size to balloon back up to ~140 MB from 110 MB

spuder
  • 17,437
  • 19
  • 87
  • 153
  • 1
    Awesome research! If we are talking about 3gb -> ~100mb savings though. I'd always recommend the fastest repack. Taking 25 more minutes and saving 50mbs well, that seems inefficient. Also, I ran the 30 minute one, and my computer was mostly unusable for 30 minutes due to git using 8 threads for what seems like a write heavy operation. – Parris Jan 15 '15 at 18:57
  • 1
    About "Update2": re-running `git repack` with the `-f` flag will *always* throw all existing work out and redo the whole packing. Never use `-f` if you have already used lots of CPU time to pack the whole thing. – Mikko Rantalainen Nov 17 '15 at 07:35
  • An interesting question would be related to [a comment elsewhere](https://stackoverflow.com/questions/28720151/git-gc-aggressive-vs-git-repack/28720432#comment72726206_28721047) about the `depth` affecting the checkout-time for (mostly) old objects. Did you happen to compare that as well? – Tobias Kienzler Jul 06 '17 at 06:50
16

"Object window" - when repacking git compares each object (every version of every file, every directory tree object, every commit message, every tag...) against a certain number of other similar-ish objects to find one that creates the smallest delta - roughly speaking, the smallest patch that can create this object from that base object.

"Delta chain" - When, in order to re-create object A, you first have to check out object B and apply a delta to it, but in order to create B you need object C, which requires D ....

Up to a point, increasing both depth and window can give you smaller packs. However, there are tradeoffs. For window, a higher setting means that git repack will compare each object with more objects while it is running, resulting in (potentially significantly) longer running time for git repack. However, once the pack is generated, window has no effect on further operations (outside of other repacks, anyway). depth, on the other hand, has less impact on the run time of git repack itself (although it still affects it somewhat), but the deeper your delta trees get, the longer it takes to re-build an old object from the sequence of base objects required to create the file. That means longer times for things like checkout when you're referencing older commits, so it can have a significant impact on the perceived efficiency of git if you do a lot of digging through your history. And, since git doesn't create deltas only against older objects, you can on occasion find a recent object that is slow to extract because it's a number of levels down the tree - it's not as common as with older objects, but it does happen.

I personally use window=1024 and depth=256 on all my repos except for a couple of clones of very large projects (e.g. Linux kernel).

twalberg
  • 59,951
  • 11
  • 89
  • 84
  • On large projects (like the linux kernel) do you set the window and depth higher or lower? – spuder Jan 07 '14 at 16:33
  • @spuder I generally go with a lower window for larger projects, otherwise the repack time goes out the roof... – twalberg Jan 07 '14 at 16:35