1

Five years ago Niklas asked this quite similar question - I'll give it another try with different wording.

I want to migrate an existing Subversion repository to git and use the chance to get rid of all history that doesn't affect my current trunk/master HEAD before I share it with my colleagues (all other history should stay intact of course).

My idea was to first git svn clone the repository (without the branches intentionally):

git svn clone http://my/old/svn/repo/trunk new-git-repo

.. and then to remove all files I don't need any more with some magic like this:

for f in $(all_deleted_files) 
do
    git filter-branch --tree-filter 'rm -f ${f}' HEAD
done 

Of course the big question is now: how do I get all_deleted_files?

I could write a nice Python script and collect all files in all commits and subtract those which still exist on HEAD. But is this the only possible way?

Has someone done this before and wants to impress me with his/her script?

With a different (Subversion specific) approach - might it possible to not clone files which later got deleted anyway in the first place?

frans
  • 8,868
  • 11
  • 58
  • 132
  • In case you get an answer, i want to say your code would run for ages. Looping over the entire commit history for every file. Better put your loop inside the tree-filter. – lucanLepus Jun 07 '17 at 17:09
  • Actually it would run quite quickly. You don't have to inspect all history but only iterate each commit from `HEAD` to the very beginning, fetch the `tree` objects and collect file names (and renamings). I did something similar before but for a different purpose, so it would take me quite a while to implement this I guess.. – frans Jun 07 '17 at 17:43
  • See the [accepted answer](https://stackoverflow.com/a/17909526/1256452) at the other linked question https://stackoverflow.com/q/17901588/1256452 from the similar, 5-year-ago question. Replace `--all` with filtering just `HEAD` to do just the current branch, and of course substitute in the appropriate list of files to keep. – torek Jun 07 '17 at 17:50
  • One addendum: I see you mention renamed files. Remember that Git doesn't track file names; it attempts to detect, dynamically, "similar content" when comparing any two commits, adjacent or not. I'm not sure how SVN identifies files under the hood but you'd need something fancy if trying to detect renames in Git. – torek Jun 07 '17 at 17:53

1 Answers1

0

I don't think someone (or many) did this before as it doesn't make too much sense for most repos. Usually different files in the same repo form one unit, so leaving out the history of later deleted files corrupts the whole history for most repos. And if some file was renamed, you will also loose the history before the renaming.

If you anyway really want to do this, something like the following should do:

git filter-branch --prune-empty --tag-filter cat --tree-filter 'files="$(git diff master --no-renames --diff-filter ACMRTUXB --name-only)" && if [ -n "$files" ]; then rm -f $files; fi' master
Vampire
  • 35,631
  • 4
  • 76
  • 102