Our current repo has tens of thousands of commits and a fresh clone transfers nearly a gig of data (there are lots of jar files that have since been deleted in the history). We'd like to cut this size down by making a new repo that keeps the full history for just the files that are currently active in the repo, or possibly just modify the current repo to clear the deleted file history. But I'm not sure how to do this in a practical manor.
I've tried the script in Remove deleted files from git history:
for del in `cat deleted.txt`
do
git filter-branch --index-filter "git rm --cached --ignore-unmatch $del" --prune-empty -- --all
# The following seems to be necessary every time
# because otherwise git won't overwrite refs/original
git reset --hard
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git reflog expire --expire=now --all
git gc --aggressive --prune=now
done;
But given that we have tens of thousands of deleted files in the history and tens of thousands of commits, running the script would take an eternity. I started running this for just ONE deleted file 2 hours ago and the filter-branch command is still running, it's going through each of the 40,000+ commits one at a time, and this is on a new Macbook pro with an SSD drive.
I've also read the page https://help.github.com/articles/remove-sensitive-data but this only works for removing single files.
Has anyone been able to do this? I really want to preserve history of currently tracked files, I'm not sure if the space savings benefit would be worth creating a new repo if we can't keep the history.