I've never used BFG before. It sounds useful if you're in this situation of having large files that you need to remove. However, I'll try to explain the overall process, as I understand it.
Before we begin, note that BFG will rewrite the history of the the remote repository, and pushing it will require everyone on your team to re-clone the repository and transfer their local-only branches over.
According to git's documentation, git clone --mirror
Set up a mirror of the source repository. This implies --bare. Compared to --bare, --mirror not only maps local branches of the source to local branches of the target, it maps all refs (including remote-tracking branches, notes etc.) and sets up a refspec configuration such that all these refs are overwritten by a git remote update in the target repository.
This means that the clone will create an exact copy of the remote repository on your machine. As the BFG docs say, you should create a backup of this clone in case you need it later.
java -jar bfg.jar --strip-blobs-bigger-than 100M some-big-repo.git
Will target the clone you made with git clone --mirror
and will clean all commits of files containing > 100M except the most recent commit (as mentioned in the BFG docs). BFG won't delete the old data automatically. It will stop, let you confirm everything looks good and then leave you to clean up the rest.
cd /Users/me/myrepomirror.git
Will navigate to the bare repository. You may have to change the path accordingly.
git reflog expire --expire=now --all && git gc --prune=now --aggressive
Let's break this command up into it's two logical parts:
git reflog expire --expire=now --all
- The expire subcommand will prune older reflog entries. The reflog is a log of the refs the HEAD has pointed to.
--expire=now
tells git to expire all reflogs prior to the current time.
--all
means across all references. Without --all, the expiration would only happen for the branch you're currently on, rather than all branches.
git gc --prune=now --aggressive
- git gc handles garbage collection for git. Normally, it'll run in the background on its own, but it is useful to be able to run it sometimes.
--prune=now
tells git gc to remove loose objects prior to the current time.
--aggressive
will cause git gc to spend more time cleaning the repository of unnecessary files and provide greater optimization. The git gc
docs have some additional info on it.
Once all of that is done, git push
will overwrite the remote version of all of the branches with the newly cleaned ones.
You would now have to re-clone the repository in a different directory with git clone
to obtain a non-bare version.
Essentially what we've done with this process is create a copy of the remote repository, removed the offending files and rewritten the commit history in the process, pushed the rewritten remote and overwritten what was there previously, and cloned a new copy of that repository for us to continue working.
Preventative measures
I'd suggest some preventative measures to avoid having to constantly remove these files. BFG
shouldn't be run frequently, since it rewrites the repository's history.
Unfortunately, .gitignore doesn't support ignoring files larger than a given size. However, there may be some options available to you, regardless.
- If all of these large files have a particular file extension or are in a specific directory, simply add them to the .gitignore file to prevent git from tracking them.
- Create a pre-commit hook which will prevent files above a certain size from being added. There seems to be a script (I haven't tested it) in response to this SO post.
- This is a client-side githook, meaning it will need to be distributed to other developers on your team.