19

I've rewritten the history of my repository to remove some large FLV files using git filter-branch. I primarily followed the Github article article on removing sensitive data and similar instructions found elsewhere on the Internet:

Removing the large FLVs:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch public/video/*.flv' --prune-empty -- --all

Removing the original refs:

rm -rf .git/refs/original/

Clearing the reflog:

git reflog expire --expire=now --all

Pruning unreachable objects:

git gc --prune=now

Aggressivly pruning unreachable objects:

git gc --aggressive --prune=now

Repacking things:

git repack -A -d

And my gitdir is still 205 MB, contained almost entirely in a single packfile:

$ du -h .git/objects/pack/*
284K    .git/objects/pack/pack-f72ed7cee1206aae9a7a3eaf75741a9137e5a2fe.idx
204M    .git/objects/pack/pack-f72ed7cee1206aae9a7a3eaf75741a9137e5a2fe.pack

Using this script, I can see that the FLVs I've removed are still contained in the pack:

All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file.
size   pack   SHA                                       location
17503  17416  1be4132fa8d91e6ce5c45caaa2757b7ea87d87b0  public/video/XXX_FINAL.flv
17348  17261  b7aa83e187112a9cfaccae9206fc356798213c06  public/video/YYY_FINAL.flv
....

Cloning the repository via git clone --bare my-repo yields my-repo.git which is also 205MB in size.

What can I do to remove these (presumably) unreferenced objects from the pack and shrink my repository back to size it would be if they'd never been committed? If they are still referenced somehow, is there a way to tell where?

Update

Upon attempting to re-run git filter-branch, I received this notice:

Cannot create a new backup.
A previous backup already exists in refs/original/
Force overwriting the backup with -f

I verified that there were no refs in .git/refs/original, indeed, the directory didn't exist at all. Is there some other way that git stores refs, that I don't know about?

user229044
  • 232,980
  • 40
  • 330
  • 338
  • Cloning the repository via `git clone --bare my-repo` yields `my-repo.git` which is also 205MB in size, so no; the packfile and its huge contents come with the clone. – user229044 May 18 '12 at 17:40
  • Your deleted answer is interesting and may be useful to others - would you consider editing your question to describe the real order of commands that you did, and then putting back an answer explaining about the `refs/original` refs being packed? (It's a subtle point that you can have refs which just exist in pack files, and not a file under `refs`.) – Mark Longair May 18 '12 at 17:43
  • @MarkLongair I'm still playing around, trying to reproduce the results from my deleted answer. I've cloned the repo, and found that running `git repack -a` *before* running `rm -rf .git/refs/original` does **not** seem to affect the outcome. It doesn't seem to affect the contents of `.git/refs/original`. – user229044 May 18 '12 at 17:47
  • I have exactly the same issue (size didn't go down like it should, can't create a new backup), and I *didn't* run the git repack command. Will try to clone and re-filter and see if that helps. – yoyo Nov 24 '12 at 06:04
  • Clone and re-filter worked. I'm on git 1.7.10, running on Windows 7. – yoyo Nov 26 '12 at 04:58
  • 5
    I read a half dozen other solutions on stackoverflow trying to remove an egregiously large backup file from a packfile. This is the only set of commands that actually worked, and I can only assume the additional arguments to filter-branch: '-- --all' did the trick. Thank you thank you! – Brian Slezak Jun 17 '13 at 21:09

1 Answers1

8

Upon cloning a fresh copy of the repository, I was able to run the commands exactly as above, and achieve the desired result: My .git directory was reduced from 205 MB down to 20 MB, and the large FLV files were removed cleanly from the packfile.

The first attempt was also performed on a fresh clone to which I had made no modifications, so I do not have a satisfying explanation for why the FLV files continued to linger inside the packfile.

I originally submitted the below answer, thinking that I'd caused a problem by running git repack -a before removing .git/refs/original, causing the original refs to become packed so that when I did remove .git/refs/original there was no effect; my original refs would still be referencing the large FLV files. This doesn't seem to hold up, however. Running the above commands on a freshly cloned copy of the repository with the addition of git repack -a immediately after git filter-branch doesn't seem to affect the outcome - the FLV files are still purged from the packfile. I have no reason to believe this is relevant to the original problem.


Is there some other way that git stores refs, that I don't know about?

There is. It turns out I wasn't entirely truthful about the order of commands as listed above. I had run git repack -a before running rm -rf .git/refs/original, and Git had packed the refs away (to be determined where; experimenting now). When I then ran rm -rf .git/refs/original, nothing was removed. git gc was unable to shrink my packfile because I did still having lingering references to the old files due to the packed refs/original refs.

user229044
  • 232,980
  • 40
  • 330
  • 338