6

I have a 2GB Git bare repo with 30000 commits. Unfortunally I have early added some large files over the years, which I now have cleaned up using git filter-branch - see e.g. this link. Identification of the culpits - see e.g. this link I have also cleaned away many old "test try" branches using git branch -D BRANCH-NAME. Now the fun starts :-)

In my bare repo it seems that git gc --aggressive --prune=yes does not reduce the directory size of the bare repo, but if I clone the bare repo and run git gc --agressive --prune=yes then my directory size drops to 50 MB (which is REALLY nice).

I have also tried this nice idea, but it does not show SHAs or any other lead on where to cut the knife.

I could really use your input on safely getting rid of my bare repo storage in a systematic way.

Further relevant links that I read, tried, but it did not help this, this, this

Peter Toft
  • 565
  • 7
  • 19
  • Did you try https://stackoverflow.com/a/28720432/6309? (gc+repack+prune) – VonC Jan 20 '18 at 09:13
  • Did you removed the refs/original folder that a filter-branch might have left? (if you did it on the bare repo) – VonC Jan 20 '18 at 09:14
  • @VonC thanx - refs/original is clean – Peter Toft Jan 20 '18 at 09:22
  • @VonC I tried gc+repack+prune (thanx), but no dice. I still have a HUGE file objects/pack/pack-a3bd6008bef2cd518bb95496cb8435ed65b9ddcd.pack – Peter Toft Jan 20 '18 at 09:49
  • @axiac tried - did not help though. Thanx anyway – Peter Toft Jan 20 '18 at 09:50
  • I also tried git reflog expire --expire-unreachable=all --all no dice – Peter Toft Jan 20 '18 at 10:00
  • Can you try https://rtyley.github.io/bfg-repo-cleaner/ on that bare repo? (make a copy first, just for safekeeping) – VonC Jan 20 '18 at 10:44
  • @VonC I have during yesterday experimented a lot with BFG (thanx). And I like it a lot, but it is not as fine grained as "git filter-branch" is. – Peter Toft Jan 21 '18 at 09:34
  • Sure: the point was just to see if that would help reducing the size (providing you delete the extra folder it creates) – VonC Jan 21 '18 at 09:35
  • @VonC I have during yesterday experimented a lot with BFG (thanx). And I like it a lot, but it is not as fine grained as "git filter-branch" is. However a great thing was to run "bfg-1.12.16.jar -B 20" which prints the top list of bloat files. I learned that "Joe Average" commited a core file, several pdf files etc on "personal" work branches. Lovely :) Those I have nuked now with "git filter-branch". – Peter Toft Jan 21 '18 at 09:41
  • Status; I am down from 2GB to 450MB using harder git filter-branch + git gc. Similar work with bfg + git gc gives me 1100MB. I clearly have left over SHAs with bloat, that I cannot clean up with with gc after deleting all relevant branches with "bloat" commits. – Peter Toft Jan 21 '18 at 09:42
  • Even (in the case of bfg) after a `git reflog expire --expire=now --all && git gc --prune=now --aggressive`? – VonC Jan 21 '18 at 09:50
  • git reflog expire --expire=now --all && git gc --prune=now --aggressive -> does not reduce directory size of the bare repo - however cloning from bare to a normal git repo and running this command reduces directory size from 450 MB to 55 MB (all branches + all tags are intact) – Peter Toft Jan 21 '18 at 12:19
  • Any filter-branch or bgf would leave behind some extra folder to be removed. (like `refs/original`). Is it possible the size of the bare repo is still big because of those original elements? – VonC Jan 21 '18 at 12:32
  • I doubt it - however I tried http://www.ducea.com/2012/02/07/howto-completely-remove-a-file-from-git-history/ and see $ git verify-pack -v objects/pack/*.idx | sort -k 3 -n | tail -3 gives 2a31e09d92ad07ef7eb4fd4f35f3cdfa6534f26d blob 824965 742807 10737178 b618d754d3fe6859cc93bd82ee0c62d4bfa734c7 blob 849168 116649 8739945 2 a930f65043cc1e1bbb7e5d5f2fd15d327ff9e37f 97041233c36c3d4c2d3bc133840c35be7a978684 blob 1152730 80523 139905523 how to use? – Peter Toft Jan 21 '18 at 12:37
  • You are supposed to use `git rev-list --objects --all | grep ` to find the filenames behind those revisions. – VonC Jan 21 '18 at 12:50
  • @PeterToft `git gc --aggressive --prune=yes` should also work for bare repo. Can you show the detail `objects/` size after executing `git gc --aggressive --prune=yes`? And what's the `.git/objects/` size if you execute `git gc --aggressive --prune=yes` in cloned repo? – Marina Liu Jan 22 '18 at 02:42
  • @MarinaLiu-MSFT See https://pastebin.com/jPTz8XwJ Very fishy that in the bare repo I cannot reduce the size of huge .git/objects/pack/pack-cf4df31d6c14f939915003a0e4da76a3e19b2b3f.pack I have git 2.7.4 – Peter Toft Jan 22 '18 at 11:27
  • Retested with git 2.16.1 -> same results – Peter Toft Jan 22 '18 at 13:34
  • @PeterToft Based on the screen shot, all the loose objects has already been packed into `.git/objects/pack/*.pack `. If there has not new changes committed after that, the size of the `.git/objects/` won't be changed even you execute `git gc` again. And based on the git gc document https://git-scm.com/docs/git-gc, it also says "git-gc - Cleanup unnecessary files and optimize the **local repository**". – Marina Liu Jan 24 '18 at 03:06
  • I start to suspect that the culpit is centered around submodule handling – Peter Toft Jan 24 '18 at 21:18
  • @PeterToft do you have any .gitmodules in your repo? – VonC Jan 25 '18 at 21:11
  • If the directory size drops to an acceptable level on a cloned repo, what is the problem with coping the `.git` directory from the cloned repo to the original repo? – Jacob Lambert Jan 29 '18 at 22:48

1 Answers1

0

This is probably what is missing to your command:

git reflog expire --expire=now --all
git gc --aggressive --prune=now

also try: BFG Repo-Cleaner

https://rtyley.github.io/bfg-repo-cleaner/
eNeF
  • 3,241
  • 2
  • 18
  • 41