I imported a very old SVN project, with git svn clone. The problem was that I picked up the root folder of that repo, where I already had imported all other sub-projects (into new git repos), and each one was deleted from SVN. So, when importing the root folder (with the final 8 subfolders) into one single git repo, all history for the full repo was also imported (included the deleted sub-projects history).
I did several commands to clean the pack file, with no success. It has always 571Mb. The only command that reduced it a bit was:
git repack -a -d --depth=500 --window=1000 -f
Googling, I found lots of helps for files being deleted, or deleting big blobs history, but not for already vanished files.
I created a list with all deleted folders I need to vanish (only top level folders on this list), with this command:
git log --diff-filter=D --summary | grep delete | cut -d" " -f5 | cut -d"/" -f1 | grep -v "\"" | sort | uniq > /tmp/tokill.txt
Then, I did this (after a little edit, to preserve 2 folders from history deletion):
git filter-branch --index-filter 'cat /tmp/tokill.txt | xargs git rm --cached --ignore-unmatch -r'
At this time, log was kind of rewriten. I no longer was able to list deleted files. But the pack was yet 571Mb size, even after repacks, gc and/or prune.
What am I missing? Any help is apreciated.
Best, Lovato
ADDED on 2014-08-05:
Just to clarify a bit more: I already preserved the individual sub-projects history because I already migrated them to git. After that, these folders were wiped out from svn. So, I really want to get rid of history, because it does not belong to this scope. I understand that its weird to git, but I would like to know if I can do it or not.
I splited one huge SVN repo into several git repos to make everyone's live easier. This original SVN repo has 6 years, and tons^2 of SVN commits, so I cannot dig one-by-one to check if it would be removed or not.
About size, w/o that history (which contains history for big blobs) it has less than 1Mb. Its just a bunch of java code, docs and a few images.
The (perhaps) correct way was to first move all those root-folders to a folder called "last_project", and then svn-git clone this "last_project", and all history belonging to "/" (which means ALL history) would remain on SVN.
ADDED on 2014-08-05 - II: partial solution
When reviewing my question, Stackoverflow started to suggest other similar question I did not find earlier, because they are only kind of related. One of those is about the BFG tool. BFG tool dows not clear "history for files that no longer exists on disk", but did a pretty good job erasing all history for files that were (somewhen) bigger than X kb. Then, my total repo size now is 20Mb, and Jenkins (and everyone) can download it in 2secs from now on.
http://rtyley.github.io/bfg-repo-cleaner/
I still have a bare copy of my original repo, to apply any solution that may be suggested.
ADDED on 2014-08-06:
I had to completelly wipe out my old git repo, create a new one, and them push the newly rewriten repo. Its working now. Not the way I wanted, but working.