12

I have read a few different threads on removing large binary files from git commit history, but my problem is just a little bit different. Hence my question here to understand and confirm the steps--

My git repo is ~/foo. I want to remove all *.jpg, *.png, *.mp4, *.ogv (and so on) from one of the directories inside the repo, specifically from ~/foo/public/data.

Step 1. Remove the files

~/foo/data > find -E . -regex ".*\.(jpg|png|mp4|m4v|ogv|webm)" \
    -exec git filter-branch --force --index-filter \
    'git rm --cached --ignore-unmatch {}' \
    --prune-empty --tag-name-filter cat -- --all \;

Step 2. Add the binary file extensions to .gitignore and commit .gitignore

~/foo/data > cd ..
~/foo > git add .gitignore
~/foo > git commit -m "added binary files to .gitignore"

Step 3. Push everything

~/foo > git push origin master --force

Am I on the right track above? I want to measure twice before I cut once, so to say.

Update: Well, the above gives me the error

You need to run this command from the toplevel of the working tree.
You need to run this command from the toplevel of the working tree.
..

So I went up the tree to the top level and re-ran the command, and it all worked.

Community
  • 1
  • 1
punkish
  • 13,598
  • 26
  • 66
  • 101
  • I was going to use this method with `find`, but it re-runs the `filter-branch` on every commit and branch for every file. In my case, that would've been over 16,000 times! What worked for me was `git rm -r` and just specifying the name of the directory containing the offending files... ``git filter-branch --force --prune-empty --index-filter 'git rm -r --cached --ignore-unmatch path/to/image/files' -d /cygdrive/r/git-rewrite_`date +"%Y%m%d_%H%M%S%z"` --tag-name-filter cat -- --all`` – Vince Jun 04 '14 at 12:10
  • Thanks, this worked like a charm for me. – Tobias Oct 18 '19 at 05:49
  • I have updated my 2013 answer with a 2020 tool. – VonC Jul 09 '20 at 04:05

1 Answers1

10

The process seems right.

You can also test your clean process with a tool like bfg repo cleaner, as in this answer:

java -jar bfg.jar --delete-files *.{jpg,png,mp4,m4v,ogv,webm} ${bare-repo-dir};

(Except BFG makes sure it doesn't delete anything in your latest commit, so you need to remove those files in the current index and make a "clean" commit. All other previous commits will be cleaned by BFG)

Update 2020: for removing files, you would now use git filter-repo (Git 2.22+, Q4 2019), since git filter-branch or BFG are now, 7 years later, obsolete.

git filter-repo --path fileToRemove --invert-paths
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • 3
    The BFG is likely a good tool for the job (disclaimer: I'm the creator of The BFG) - I just want to clarify that it does a similar job to `git-filter-branch`, so it would probably _replace_ the script in Step 1 (rather than 'test' it). The BFG acts over the entire repo however, and currently can not be restricted to a single folder path like `~/foo/public/data`. If files with those extensions don't exist elsewhere in the repo, then that's not a problem. Alternatively, if they *do* exist, but are in protected commits (eg your `HEAD` commit) then they won't be deleted either. – Roberto Tyley Jul 02 '13 at 12:55
  • @RobertoTyley thank you for your comment, and for BFG :) Great tool. – VonC Jul 02 '13 at 12:57
  • You're welcome - it's great to hear about people using The BFG! – Roberto Tyley Jul 02 '13 at 15:38
  • Shouldn't it be `git filter-repo --path fileToRemove --invert-paths`? – Michel Jung Oct 30 '20 at 09:22
  • @MichelJung Thank you. That was a typo. I have edited the answer accordingly. – VonC Oct 30 '20 at 09:36