108

I accidentally added, committed and pushed a huge binary file with my very latest commit to a Git repository.

How can I make Git remove the object(s) that was/were created for that commit so my .git directory shrinks to a sane size again?

Edit: Thanks for your answers; I tried several solutions. None worked. For example the one from GitHub removed the files from the history, but the .git directory size hasn't decreased:

$ BADFILES=$(find test_data -type f -exec echo -n "'{}' " \;)

$ git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $BADFILES" HEAD
Rewrite 14ed3f41474f0a2f624a440e5a106c2768edb67b (66/66)
rm 'test_data/images/001.jpg'
[...snip...]
rm 'test_data/images/281.jpg'
Ref 'refs/heads/master' was rewritten

$ git log -p # looks nice

$ rm -rf .git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune
Counting objects: 625, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (598/598), done.
Writing objects: 100% (625/625), done.
Total 625 (delta 351), reused 0 (delta 0)

$ du -hs .git
174M    .git
$ # still 175 MB :-(
Ricardo Sanchez-Saez
  • 9,466
  • 8
  • 53
  • 92
Jonas H.
  • 2,331
  • 4
  • 17
  • 23
  • 15
    Just a reminder for moderators, this question 100% belongs on SO, not superuser. – VonC Sep 26 '10 at 13:55
  • See also http://stackoverflow.com/questions/2116778/reduce-git-repository-size/2116892#2116892 and http://stackoverflow.com/questions/685319/git-pull-error-unable-to-create-temporary-sha1-filename/685422#685422 – VonC Sep 26 '10 at 13:56
  • As mentioned here (http://stackoverflow.com/questions/685319/git-pull-error-unable-to-create-temporary-sha1-filename/685422#685422), did you try a repack after your gc? `git-repack -a`followed by `git-prune-packed` for instance. See http://blog.felipebalbi.com/2007/12/19/housekeeping-your-git-repository/ – VonC Sep 26 '10 at 15:23
  • 2
    @Jonas: and what if, after you did all that, you clone your repo? Would you *then* get a clone with the desired reduced size? – VonC Sep 26 '10 at 17:24
  • @VonC: No. Same size. btw, I still see the "bad" commit in `git reflog` -- I'm not supposed to be, am I? – Jonas H. Sep 26 '10 at 17:55
  • 1
    @Jonas: after all that you did (`filter-branch`, `gc`, `repack`, ...), no, you shouldn't see any bad commit at all. This is a sign that the cleaning didn't took place as expected. – VonC Sep 26 '10 at 18:46
  • Possible duplicate of [How to remove unreferenced blobs from my git repo](https://stackoverflow.com/questions/1904860/how-to-remove-unreferenced-blobs-from-my-git-repo) – JojOatXGME Sep 28 '17 at 11:05

9 Answers9

154

I answered this elsewhere, and will copy here since I'm proud of it!

... and without further ado, may I present to you this useful script, git-gc-all, guaranteed to remove all your git garbage until they might come up with extra config variables:

git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 \
  -c gc.rerereresolved=0 -c gc.rerereunresolved=0 \
  -c gc.pruneExpire=now gc "$@"

The --aggressive option might be helpful.

NOTE: this will remove ALL unreferenced thingies, so don't come crying to me if you decide later that you wanted to keep some of them!

You might also need to run something like these first, oh dear, git is complicated!!

git remote rm origin
rm -rf .git/refs/original/ .git/refs/remotes/ .git/*_HEAD .git/logs/
git for-each-ref --format="%(refname)" refs/original/ |
  xargs -n1 --no-run-if-empty git update-ref -d

I put all this in a script, here:

https://ucm.dev/t/bin.git/git-gc-all-ferocious

Sam Watkins
  • 7,819
  • 3
  • 38
  • 38
  • 2
    As in http://stackoverflow.com/questions/1904860/how-to-remove-unreferenced-blobs-from-my-git-repo/14728706#comment20614863_14728706, +1 to you again. – VonC Feb 06 '13 at 17:57
  • 24
    excellent :D my evil plan to get more points by cloning answers has worked!!1 ;) – Sam Watkins Feb 08 '13 at 02:39
  • Yes! This worked, but I had to run the full script. Running only the gc command (with config options) was not enough. – Daniel Dec 14 '13 at 08:39
  • 5
    102m to 160k.. effective and destructive – prusswan Mar 15 '16 at 03:30
  • Thanks! That did just what I was looking for! – MarcusJ Aug 08 '16 at 06:44
  • 5
    Thanks so much for the script! Bonus info: The `xargs` command produces an error on OS X because of an unrecognized option. Simplest solution: Install GNU xargs via homebrew `brew install findutils` and replace `xargs` by `gxargs`. – qqilihq Sep 27 '17 at 20:21
  • I assume the idea after that is to add remote origin again, then push it so that the remote rep also gets all this destructiveness, right? – Bernardo SOUSA Aug 30 '19 at 23:36
  • what does `"$@"` do? – AaA Sep 02 '20 at 07:14
  • @AaA, about "$@", see here: https://stackoverflow.com/a/9994328/ – Sam Watkins Sep 03 '20 at 18:03
  • 1
    @SamWatkins The URL you added http://sam.nipl.net/b/git-gc-all-ferocious is now old, outdated & 404, can you update the URL so it doesn't go 404 when someone looking for assist clicks on it ? – Vicky Dev Aug 10 '22 at 13:25
  • @VickyDev ok I've updated the link – Sam Watkins Aug 11 '22 at 18:53
31

Your git reflog expire --all is incorrect. It removes reflog entries that are older than the expire time, which defaults to 90 days. Use git reflog expire --all --expire=now.

My answer to a similar question deals with the problem of really scrubbing unused objects from a repository.

Community
  • 1
  • 1
Josh Lee
  • 171,072
  • 38
  • 269
  • 275
23

1) Remove the file from the git repo (& not the filesystem) :

  • git rm --cached path/to/file

2) Shrink the repo using:

  • git gc,

  • or git gc --aggressive

  • or git prune

or a combination of the above as suggested in this question: Reduce git repository size

Community
  • 1
  • 1
Jamie
  • 231
  • 2
  • 2
10

This guide on removing sensitive data can apply, using the same method. You'll be rewriting history to remove that file from every revision it was present in. This is destructive and will cause repo conflicts with any other checkouts, so warn any collaborators first.

If you want to keep the binary available in the repo for other people, then there's no real way to do what you want. It's pretty much all or none.

Daenyth
  • 35,856
  • 13
  • 85
  • 124
9

The key for me turned out to be running git repack -A -d -f and then git gc to reduce the size of the single git pack I had.

7

Hy!

Git only receives objects it actually needs when cloning repositories (if I understand it correctly)

So you can amend the last commit removing the file added by mistake, then push your changes to the remote repository (with -f option to overwrite the old commit on the server too)

Then when you make a new clone of that repo, it's .git directory should be as small as before the big file(s) committed.

Optionally if you want to remove the unnecessary files from the server too, you can delete the repository on the server and push your newly cloned copy (that has the full history)

u-foka
  • 657
  • 6
  • 4
5
git filter-branch --index-filter 'git rm --cached --ignore-unmatch Filename' --prune-empty -- --all

Remember to change Filename for the one you want to remove from the repository.

Martin
  • 11,216
  • 23
  • 83
  • 140
5

See "Removing Objects" in the Pro Git book:

http://git-scm.com/book/en/Git-Internals-Maintenance-and-Data-Recovery#Removing-Objects

Update: see also BFG repo cleaner: http://rtyley.github.io/bfg-repo-cleaner/

Czarek Tomczak
  • 20,079
  • 5
  • 49
  • 56
0

In 2020 the documentation for git-filter-branch discourages its use and recommends using an alternative such as git-filter-repo. It can also be used instead of BFG.

Note that the chapter on Rewriting History in the git book hasn't been updated. Neither has GitHub's recommendation on removing sensitive data.

user2465896
  • 183
  • 8