My situation is that many bulky JPGs have made it into our repo, adding 100s of MBs, much more than the src code itself.
I have since optimized these JPGs to consume less than 1/20 their file size, with otherwise no perceivable change. Committed and pushed back.
However, local copies still have this disk space used up in the .git archives (internally containing all previous versions of all files). Anyone new pulling also gets this wasted space.
Our origin master is on Bitbucket.
I have spent considerable time trying to figure out from good guides like
git gc
or http://linux.yyz.us/git-howto.html
and How to remove local (untracked) files from the current Git working tree? suggesting
git clean -n
What might be a way to simply purge only these huge JPG files from only one particular commit from the archives, and even from the online Bitbucket repo so no one has to pull them again? Of course we want
- The current versions of all files to be kept
- As much as possible, revision histories before & after preserved, at least meta knowledge that there has been a commit (because other non-jpg files had been affected then too)
- There are 200+ JPG files. Can this operation be done in one fell swoop? Using wildcards like *.jpg in some parameter, or a for loop?
There has been no prior version in the repo of the large JPG versions of files we don't want.
Among things I tried:
- Before anything, how much disk space is .git using?
du 72195 ./.git
- Find heavyweight blobs:
git verify-pack -v .git/objects/pack/pack-*.idx |sort -k 3 -n |tail -39 ... 03bcb7d79c1e0a4328420bf00647319465d5d3df blob 2446210 2430913 46915147 52ea2d848645463e01d3dd143dd8d7fd24019335 blob 2467254 2443333 27573576 12d63348c0e87f9602d395e694df6a94601c12f7 blob 2506409 2485495 49346060 645fe7bfaf6ecd0140d144b4c40c19e78f103bd6 blob 2581349 2554398 10567725 72672204aa3c7aec431cba02b32ac012e52e601d blob 3084793 3041294 13122123
- What did that last big blob contain?
git rev-list --objects --all |grep 72672204 72672204aa3c7aec431cba02b32ac012e52e601d images/2.jpg
- Which commits affected this particular file images/2.jpg (one of the many whose unneeded copy I hope to kill)?
git log --pretty=oneline --branches -- images/2.jpg 98dc75de48a63c2ab9661eb62895ac39ef331aaa MAPSDH-10 #time 30m #comment Grab live copy of Simon's source and push it onto Bitbucket repo; master@gordito,2014-04-10_13-55-02 3e7f36f0b1a913feaf43547bca4ad3a5a08957a6 MAPSDH-10 #time 30m #comment Grab live copy of Simon's source and push it onto Bitbucket repo; master@gordito,2014-04-10_13-31-49
- Okay then, so try to remove only the copy of images/2.jpg prior to commit # 3e7f36f0, inclusive:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch images/2.jpg' -- 3e7f36f0^.. Cannot rewrite branches: You have unstaged changes.
- Since it's refusing, just remove it altogether from the cache:
git rm --cached --ignore-unmatch images/2.jpg rm 'images/2.jpg'
However, I hope this CURRENT version of
images/2.jpg
will still be in the repo!Count the file space usage of local git archives:
git count-objects -v count: 0 size: 0 in-pack: 284 packs: 1 size-pack: 72101 prune-packable: 0 garbage: 0 size-garbage: 0
- size-pack is still 72101 (72MB, as in origin
du
). It didn't seem to free up 3084793 (3MB) as expected, anyway.