14

Was searching throw SO for an answer to this. Came across this older thread which didn't seem to give any answers. Retriggering this thread hoping someone may know!

Can someone tell me the difference b/w git subtree and git filter-branch? I'll use the same example in the original question for this:

git subtree split --prefix=some_subdir -b some_branch

git filter-branch --subdirectory-filter some_subdir some_branch
approxiblue
  • 6,982
  • 16
  • 51
  • 59

2 Answers2

7

2016: Yes, git subtree (a contrib/ shell) can be used to split repos, as described in "Using Git subtrees for repository separation" by Stu Campbell.

You need to remove the code that you have duplicated in your split folder, though (see also theamk's answer):

git subtree split --prefix=path/to/code -b split
git push ~/shared/ split:master
git rm -r path/to/code
git commit -am "Remove split code."

That differs from git filter-branch (a native Git command) which rewrites the repo history, picking up only those commits that actually affect the content of a specific subdirectory.

Meaning: there is no code to git rm once the filter-branch has been run.
git filter-branch does not duplicate commits like git subtree split does: it deletes ("filters out") everything that does not match a certain criterion (here a subfolder path).
Again, see theamk's answer for updates: there is no duplication when using a new branch: git subtree split --prefix=some_subdir -b some_branch.


Update 2021:

git filter-repo can extract wanted paths and their history (stripping everything else)

 git switch -c some_branch
 git filter-repo --path some_subdir/ --refs some_branch
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • 2
    This is inaccurate nowadays as git subtree is now an official part of git – Shane Gannon Jan 23 '17 at 20:07
  • 2
    The difference also means if you have several(n) sub-folders you want to make them each his own repo, with git filter-branch you need to clone n times first, with subtree you need to git rm n times. – Qiulang Mar 08 '18 at 08:46
  • that answer is incorrect! if you make a copy of the branch with "git branch" or "git checkout -b", then you don't need to "clone n times" and you have a branch that you can "git rm" in. Just do "git checkout" to the original branch after you push. – theamk Sep 24 '21 at 01:07
  • I'd be hesitant to recommend third-party tool ("git filter-repo") when there is an included one available.. especially since most of the caveats for the `git filter-branch` do not apply to this use case because we do not supply any shell commands nor do we touch the tags. – theamk Sep 24 '21 at 15:41
  • @theamk the included one available is officially obsolete. "we do not supply any shell commands": not sure I understand that one though. – VonC Sep 24 '21 at 15:43
1

When executed as written, the differences are pretty minor:

  • your "subtree split" command will start from HEAD and put result to some_branch, which must not exist before
  • your "filter-branch" command will start with some_branch and put result back to some_branch, overriding some_branch with the new content.
  • In my tests, "git filter-branch" was ~50x faster (on a very old repo with only a few commits touching the selected path)

In other words, the two snippets below are exactly equivalent, as long as special subtree rejoin commits are not found.

git subtree split --prefix=some_subdir -b some_branch
git checkout some_branch

and

git checkout -b some_branch
git filter-branch --subdirectory-filter some_subdir some_branch

why bother with "git subtree" then, you may ask? For --rejoin and --onto options -- they support a very specific workflow which original author was using.

theamk
  • 1,420
  • 7
  • 14