Just a ready-to-use full version, based on @lucanLepus's accepted answer.
Let's say I am userA
, and I want to totally remove folder from history media/1 Juno-Trumpet/
(which is not present anymore in latest commits, but in far past commits) from the repo on Github.
NB: this particular repository has original branches master
, sfz
, and wifi
, and tag v1.0
. To avoid needing to know this, I use a mirror clone here (which makes a bare repository, which is fine since I will use an index filter). Then, since this is GitHub, I toss all the refs/pull/
refs first.
As it turns out, the files are also named media/Juno-Trumpet/
and media/Juno/
, so we need to remove all three path names.
git clone --mirror https://github.com/alexmacrae/SamplerBox.git
cd SamplerBox.git
git for-each-ref --format="git update-ref -d %(refname)" refs/pull | sh
git for-each-ref # to check that we have only wanted refs left
git count-objects -vH # size-pack: 54.40 MiB
git filter-branch --index-filter 'git rm -r --cached --ignore-unmatch "media/1 Juno-Trumpet" media/Juno-Trumpet media/Juno' --prune-empty --tag-name-filter cat -- --all
The filter-branch step takes a short while and ends with:
Ref 'refs/heads/master' was rewritten
Ref 'refs/heads/sfz' was rewritten
Ref 'refs/heads/wifi' was rewritten
WARNING: Ref 'refs/tags/v1.0' is unchanged
v1.0 -> v1.0 (7ec3254d08b65fd3ca8a048cef60b5b2c75f7e11 -> 7ec3254d08b65fd3ca8a048cef60b5b2c75f7e11)
(This last line indicates that the one tag in the repository comes before any of the rewritten commits, i.e., we did not need --tag-name-filter cat
after all.)
Now we must remove the refs/original/
names. Since this is a fresh clone, there are no reflogs to expire, but we'll do that anyway, and then repack with git gc
:
git for-each-ref --format="git update-ref -d %(refname)" refs/original | sh
git reflog expire --expire=now --all
git gc --prune=now --aggressive
git count-objects -vH # size-pack: 1.41 MiB
I have not done this last step:
git push origin '+refs/*:refs/*'
(and if you're really sure you want all the media files totally gone, you might want to clean out all the pull requests as well, since they will retain them for a while otherwise).
Incidentally, I found the files under the three names using:
git cat-file --batch-all-objects --batch-check | sort +2 -rn | head
to find relatively large files, followed by:
git rev-list --all | while read ref; do
git ls-tree -r $ref | grep 477145c7d0190f4e0aeea0f7bfb9accbf2c1ba48;
done | sort -u
(477145c7d0190f4e0aeea0f7bfb9accbf2c1ba48
is one of the big .wav
files. I did not check to see whether all the files removed are .wav
files and whether any other .wav
files remain.)