9

I have a large binary file in a git repository, which has been changed in a few commits. These commits also included changes to other files. I would like to have only the most recent version of the binary file in the repository, but would like to keep the history of the other files that were changed in these commits.

All of the commits in question have already been pushed to github, and pulled from their by other members of the team.

How can I do this?

EDIT: I don't believe this is a duplicate of the other referenced question. As noted in the comments below, I've looked at that question, but want to remove every version of the file except the most recent one. This criteria is not addressed in the answers to the other question.

Greg
  • 33,450
  • 15
  • 93
  • 100
  • @Andrew Neitsch, I looked at that question, but I'm not wanting to purge the entire history of the file - I just want to keep the most recent version. Would the approach be to remove the entire history of the file, and then add the current version of the file back to the repository? – Greg Apr 25 '13 at 17:43
  • Yes, that would work. But if this large file has changed before, it is probably going to change again. Given the complexity of purging files from git, you may want to keep it outside this source repository entirely. Also see [Github help: What is my disk quota?](https://help.github.com/articles/what-is-my-disk-quota) – andrewdotn Apr 25 '13 at 17:47
  • 1
    I disagree that this question duplicates http://stackoverflow.com/q/2100907/438886 - @greg specifically notes that he's looked at that question, and distinguishes that he wants to remove every version of the file _except_ the most recent one - a criteria not addressed in the answers to the referenced question. – Roberto Tyley Apr 26 '13 at 16:19

2 Answers2

6

The simplest way is to use The BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch designed specifically for removing large files from Git repos.

You should follow the usage instructions carefully, but the main step is just this - download the Java jar (requires Java 7 or above) and run this command:

$ java -jar bfg.jar  --strip-blobs-bigger-than 100MB  my-repo.git

Any blob over 100MB in size will be totally removed from your repository's history - unless it is the version present in the file tree of your latest commit, so your latest version will be untouched, as you required.

The BFG is also 10-50x faster than git-filter-branch.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

Roberto Tyley
  • 24,513
  • 11
  • 72
  • 101
3

Rather than trying to filter all but the latest version, just nuke the file from the history of your repo and re-add the most recent version:

Consider not tracking this file. Git isn't meant for large binary blobs.

Community
  • 1
  • 1
user229044
  • 232,980
  • 40
  • 330
  • 338