1

I imported a very old SVN project, with git svn clone. The problem was that I picked up the root folder of that repo, where I already had imported all other sub-projects (into new git repos), and each one was deleted from SVN. So, when importing the root folder (with the final 8 subfolders) into one single git repo, all history for the full repo was also imported (included the deleted sub-projects history).

I did several commands to clean the pack file, with no success. It has always 571Mb. The only command that reduced it a bit was:

git repack -a -d --depth=500 --window=1000 -f

Googling, I found lots of helps for files being deleted, or deleting big blobs history, but not for already vanished files.

I created a list with all deleted folders I need to vanish (only top level folders on this list), with this command:

git log --diff-filter=D --summary | grep delete | cut -d" " -f5 | cut -d"/" -f1 | grep -v "\"" | sort | uniq > /tmp/tokill.txt

Then, I did this (after a little edit, to preserve 2 folders from history deletion):

git filter-branch --index-filter 'cat /tmp/tokill.txt | xargs git rm --cached --ignore-unmatch -r'

At this time, log was kind of rewriten. I no longer was able to list deleted files. But the pack was yet 571Mb size, even after repacks, gc and/or prune.

What am I missing? Any help is apreciated.

Best, Lovato


ADDED on 2014-08-05:

Just to clarify a bit more: I already preserved the individual sub-projects history because I already migrated them to git. After that, these folders were wiped out from svn. So, I really want to get rid of history, because it does not belong to this scope. I understand that its weird to git, but I would like to know if I can do it or not.

I splited one huge SVN repo into several git repos to make everyone's live easier. This original SVN repo has 6 years, and tons^2 of SVN commits, so I cannot dig one-by-one to check if it would be removed or not.

About size, w/o that history (which contains history for big blobs) it has less than 1Mb. Its just a bunch of java code, docs and a few images.

The (perhaps) correct way was to first move all those root-folders to a folder called "last_project", and then svn-git clone this "last_project", and all history belonging to "/" (which means ALL history) would remain on SVN.


ADDED on 2014-08-05 - II: partial solution

When reviewing my question, Stackoverflow started to suggest other similar question I did not find earlier, because they are only kind of related. One of those is about the BFG tool. BFG tool dows not clear "history for files that no longer exists on disk", but did a pretty good job erasing all history for files that were (somewhen) bigger than X kb. Then, my total repo size now is 20Mb, and Jenkins (and everyone) can download it in 2secs from now on.

http://rtyley.github.io/bfg-repo-cleaner/

I still have a bare copy of my original repo, to apply any solution that may be suggested.


ADDED on 2014-08-06:

I had to completelly wipe out my old git repo, create a new one, and them push the newly rewriten repo. Its working now. Not the way I wanted, but working.

Lovato
  • 2,250
  • 1
  • 16
  • 22

1 Answers1

2

It seems like you want items that were present in the past but are no longer part of the repository to be deleted from git.

Unfortunately, git doesn't work like that. Because these items are part of the history (that is, there are still branches/refs/tags kicking around that refer to these commits in their history), they will stick around and so will objects related to those commits.

The only way to remove them completely would be to remove them from your git history. If you have a branch that refers to them, you could either delete that branch or rebase it so that it doesn't include those commits. Either way, git's garbage collection will kick in and get rid of them.

However, why do you want to do this? 571MB is not particularly large and you will be removing history completely.

Another way to do this is:

  1. Create an empty repository somewhere else
  2. Create an empty root commit in this new repository (git commit --allow-empty -m 'root commit')
  3. Add the git-svn repository as a remote (they will have nothing in common)
  4. Add a new local branch that tracks the remote branch you want
  5. Rebase this local branch onto your new empty root commit.
  6. When it's done, interactive rebase (rebase -i) one more time and fixup the commits you don't want (this will essentially combine all of them into one commit with the effect that all deleted files will get removed, but any changes to files that do exist will persist through history).
  7. Solve any conflicts. When that's done, you will have a new, pure git repository with only the history you need.
  8. Remove the remote.
  9. Run git gc

Your new repository should now be a lot smaller and your original git-svn repository should be untouched.

There is one gotcha: You should be aware that git-svn will not honor svn externals in your original svn repository and so you can only trust the git-svn repo if your svn repository does not use externals.

UPDATE

Separating out sub-projects is fine as long as you preserve the inter-dependencies. For example:

In order to build Parent project version 45, you need:
    version 2 of sub-project A
    version 10 of sub-project B
    ...
    version 30 of sub-project Z
Carl
  • 43,122
  • 10
  • 80
  • 104
  • I added more details to my original question. – Lovato Aug 05 '14 at 12:17
  • I also find a different approach, which gave a "works for me" solution. But still interested on your comments after my 1st edition to my question. – Lovato Aug 05 '14 at 14:35