2

Faced with anarchic add of binary files by a coder, how to slim down a git repository not only removing the problematic files but also their history in the tree.

I tried using bfg but as it works on mirrored bare repository I've been faced with difficulties in getting the whole workflow, needing to gather answers from different places on the web.

cmbarbu
  • 4,354
  • 25
  • 45

1 Answers1

1

What finally worked for me is to go back and forth between mirrored bare repositories and normal repositories. It might seem long but it has truly all the steps from a huge rep to a small one and it's actually fast (10mn job).

First get a local mirrored repository for the last version with all the mess (that might take a long time over the internet. It's the only step that can potentially take a lot of time) :

git clone --mirror http://myservice.org/myrepo

Then copy the result for backup purposes (I'm not kidding, we will use it at the end):

cp -r myrepo.git myrepo.git.bak

Then create a normal version from the bare version so you can clean up :

mkdir myrepo.small 
cd myrepo.small 
mkdir .git
cd .git
cp -r ../../myrepo.git/* .
cd ..
git config --local --bool core.bare false

And assuming the clean up is to be done on the master branch:

git checkout master

To clean up, spot the big repertories with:

du -sh *

And eliminate them (even if you want to keep them not being versioned, we will put them back later) with:

git rm bigThings

If you only want to keep some of the files but remove them from git history, you can use git rm --cached bigFile but I found easier to remove everything, clean up the history and then put them back.

As often as you want and at least when you are done cleaning up:

git commit -m "big clean up"

You could try to push the result to the mirrored repository but I found easier to make a new one (from the folder parent of the repository):

rm -rf myrepo.git
git clone --mirror myrepo.small 

Finally download the bfg (you need java installed) and run in on the mirrored clone. I wanted to remove the history for everything above 5M and my downloaded version of the bfg was at ~/Downloads/bfg-1.12.5.jar so I used :

java -jar ~/Downloads/bfg-1.12.5.jar --strip-blobs-bigger-than 5M myrepo.git

Confirm the clean-up (with check of the size before and after):

cd myrepo.small.git
du -sh 
git reflog expire --expire=now --all && git gc --prune=now --aggressive
du -sh 

It should feel lighter. At this point I found :

git push ../myrepo

not to be working so I just created it again from the clean mirror :

rm -rf myrepo.small
mkdir myrepo 
cd myrepo 
mkdir .git
cd .git
cp -r ../../myrepo.small.git/* .
cd ..
git config --local --bool core.bare false
git checkout master

I actually also found easier to delete my repo (on bitbucket) and recreate it empty. When everything everything right set the right central repo in .git/config and git push

To put back in the folder the troublemaker items I used unison on a normal version of the repository backed up at the beginning.

First make a normal version of the backup

mkdir myrepo.bak 
cd myrepo.bak 
mkdir .git
cd .git
cp ../../myrepo.git/* .
git config --local --bool core.bare false
git checkout master
cd ..

then run unison on the two

unison myrepo myrepo.bak

And put back what I needed. A zip of that on filesender or other substitute of a usb stick to send to all the contributors and we are up and running again.

Community
  • 1
  • 1
cmbarbu
  • 4,354
  • 25
  • 45