5

I have a subfolder of a repo that I'm trying to split into a subtree. To start, I followed this procedure (https://stackoverflow.com/a/43985326/136785) to create a branch containing just the commits related to the subfolder (including renames). I confirmed that the branch's commit log looks as expected.

Next, I create a new repo for that sub-project:

git init --bare \\nas\git\FPF.git
git push ssh://myserver.com/~/FPF.git branch-fpf:master 

Then I remove the subfolder from the parent repo & re-add it as a subtree:

git rm -r htdocs/wp-content/plugins/fpf
git add -A
git commit -am "Removing folder to re-add as subtree"
git remote add fpf ssh://myserver.com/~/FPF.git
git subtree add --prefix=htdocs/wp-content/plugins/fpf fpf master --squash

Now as a quick sanity check, I'll grab a copy of the remote subtree repo (in another folder, of course):

git clone ssh://myserver.com/~/FPF.git

And:

git subtree push --prefix=htdocs/wp-content/plugins/fpf fpf master

Because I've not committed any changes between adding the subtree & pushing, I expect there to be nothing new to push. But as it turns out, if I clone FPF.git once more, I find that it now has a TON of extra commits - FPF has grown many times larger, with a commit log that now reflects many commits that only apply to files outside of the subtree.

Why would git subtree push be pushing commits that don't apply to the subtree?

Edit 1: The extra commits are all the commits from the main (parent) repo starting before the first FPF commit & going back to the beginning of time. In other words: if I compare the logs of the FPF subtree repo before & after doing the git subtree push, they are identical, until I get to the bottom of the "pre-push" clone's log. From there, the "post-push" clone's log continues all the way back through the first commit of the parent project. Git subtree push effectively appended the parent's full prior history.

Edit 2: I've decided to give up on git-subtree. I discovered https://github.com/ingydotnet/git-subrepo, which not only works properly, but solves a number of subtree's shortcomings (most notably the VERY slow pushes). Leaving this question here in case anyone else comes up with an answer or is struggling with the same, but to simplify a bit, here's a full start-to-finish set of commands that exhibits the problem. Difference from above: this doesn't start with a branch made by grafting together multiple filter-branches; it just does the simplest case of a single subtree split:

cd MainProjectRepo
git subtree split --prefix=htdocs/wp-content/plugins/fpf --branch=branch-new
git init --bare \\nas\git\FPF.git
git remote add fpf ssh://myserver.com/~/FPF.git
git push fpf branch-new:master
git rm -r htdocs/wp-content/plugins/fpf
git add -A
git commit -am "Removing folder to re-add as subtree"
git subtree add --prefix=htdocs/wp-content/plugins/fpf fpf master --squash

git clone ssh://myserver.com/~/FPF.git /tmp/fpf1
git subtree push --prefix=htdocs/wp-content/plugins/fpf fpf master
git clone ssh://myserver.com/~/FPF.git /tmp/fpf2

As described above, fpf2 ends up with the entire history of commits from the source repo.

J23
  • 3,061
  • 6
  • 41
  • 52
  • Possibly a bug. The filtered subtree repository should have a history that ends with a root commit specific to the filtered subtree, and not reach back into the main history. Perhaps a recent change to something broke something? This would be particularly bad because the hash IDs would stop matching up. I'd have to dig in to an example (and don't have time to do that) to say for sure. – torek Apr 11 '20 at 01:06
  • "a recent change" - do you mean a recent change to Git itself, or a recent change I might have had something to do with...? – J23 Apr 11 '20 at 01:17
  • Also, I'm not sure if it's relevant: prior to the above, the FPF subdir was in the main repo. Per my other question you recently answered, I subtree-split that folder out into it own repo (FPF.git), then removed it from the main repo & committed the removal. Then the above. I wouldn't think it having previously been in the main repo should affect what's going on above though, right? – J23 Apr 11 '20 at 01:20
  • Recent change to Git - all this stuff about deprecating `git filter-branch` seems suspicious to me, and I recall seeing a fix for something fly by (I keep a somewhat up to date read-only copy for inspecting Git innards). – torek Apr 11 '20 at 01:20
  • Oh, hm: that's not how `git subtree` is intended to be used. The way one uses it is: you have program P with library L. It turns out that nobody wants P; everyone wants L. You make a subtree distribution of L. You *do not* change P (that includes L), but as people add fixes to L, you yank them back *into* your P repository. – torek Apr 11 '20 at 01:22
  • `git subtree` is the management tool that does the "make distribution, then take updates from L and put them back into P". The goal is that a new split from P, after you've incorporated updates *from* L into P, simply adds on to the L that everyone else fetches from. Remember, Git is designed to add new commits forever, without ever removing any old ones at all. – torek Apr 11 '20 at 01:24
  • re: that's not how it's intended to be used - it sounds like you're describing pretty much what I did, but perhaps my explanation was unclear (it's a bit hard in these comments, which don't allow formatting). I expanded upon the original question above, to add all steps taken prior to the subtree add & subtree push. Does that provide more clarity? :) – J23 Apr 11 '20 at 02:00
  • I have not actually _used_ the script but my understanding is that you should not `git rm -r` and then `git subtree add`. You would do the `git subtree add` only in a repository Q that never had library L, where you'd like to use L while maintaining it over in repository P. In particular, if you remove then add, the new split says: create this library, then remove it, then create it again. That doubles the number of commits. (From here on it should stay stable, unless you remove it *again*.) – torek Apr 11 '20 at 02:22
  • They do show that. I think this is wrong. Remember, Git is always, *always* about *commits* (which in general, you keep forever). Splitting the subtree to a new repository makes all-new commits. Your original repository (the one I called "program P" above) still has the original commits. If you add the subtree to it, you've added new commits. If you had 1000 subtree-ish commits and 5000 total commits before, now you have 6000 total commits (plus the squash). There's nothing inherently wrong with doing this either, if you don't mind your repository inflating by roughly 20%, but [continued] – torek Apr 11 '20 at 15:53
  • ... but that's not how `git subtree` was designed to work. The subtree split documentation notes that if you re-split the subtree from `P` again later, you get the *same commit hash IDs*. Hence if you develop the library more within P, you can re-split to a new L that has the same 1000 split-off commits, plus some new split-off commits that can be merged into the original split-off L. – torek Apr 11 '20 at 15:55
  • Splitting the subtree to a new repository makes all-new commits: true. Your original repository still has the original commits: true. If you add the subtree to it, you've added new commits: Do you? If you just add the subtree w/o squashing, then you'd add all those new subtree commits to your main repo - but if you add --squash, you should get just 2 commits: squash & merge. Thus you do still have the 5k original commits in your main repo, & the 1k subtree commits in the separate subtree repo (for 6k total across both)...but your main repo should only have 5002. That's the behavior I'm seeing. – J23 Apr 11 '20 at 18:23
  • But in any case, it seems like this is still a step before the issue I'm actually trying to unravel: all of this so far works as I'd expect. I end up with a main repo of 5002 commits (in your example), and an FPF repo with 1000 commits. It's only after I do that first PUSH that things get weird, per above. Note: I added a clarification/edit at the end - the "extra" commits are all the prior history from the main repo, starting from before the subtree was created. – J23 Apr 11 '20 at 18:24
  • I had another thought: I wondered if git subtree push's odd behavior could be related to the renames (see 1st link above, for how I extracted the subrepo across different renames, then grafted them back together). However, based on the fact that all the extra commits are from BEFORE the original creation of the subrepo's code in the main repo (per my recent edit)...that doesn't seem like it. It's adding commits that only took place before any of the subrepo's folders ever even existed. – J23 Apr 11 '20 at 18:27
  • Remember that `git push` just pushes *commits* (that already exist). So if things look weird *after* `git push`, they were weird *before* that too. However, `git subtree push` isn't the same as `git push`: it might do something extra, perhaps. – torek Apr 11 '20 at 18:35
  • Ok, so I was almost sure the odd behavior was because of how the FPF repo was grafted together across path-moves of that subfolder in the original repo. My thought was, if git subtree push --prefix=xxx is going back through the whole history & looking for commits that just affect the path "xxx" specifically, and some of FPF's history was not actually in xxx, it makes sense that it'd get confused. But no. [continued...] – J23 Apr 12 '20 at 03:22
  • ...I re-ran the whole procedure again but ONLY took the current path of FPF (no grafting), and git subtree push still pushed every commit from the original repo prior to the commits extracted to the subtree. – J23 Apr 12 '20 at 03:22
  • I just had a morning-long fight with git about this same issue, and my conclusion is that I misunderstood how subtree is supposed to work. I think the `rejoin` flag should be used *instead* of removing and subtree-adding the sub-project. This seems to be its purpose. It does add a bunch of duplicate commits, but I think that's unavoidable given that original commits may have touched files in both the project and subproject. – Mitchell Kline Jan 04 '21 at 19:59
  • Needed to edit my own past comment (04/11/2020) to remove a bad link, but SO wouldn't let me - only option was to delete. Re-posting that comment w/o the link for posterity: – J23 Nov 16 '21 at 01:00
  • Well, I've looked through quite a few tutorials about how to do this (i.e. you have a repo with a subfolder that you want to extract to its own repo, & keep using it as a subtree) - and all say to git rm -r, git subtree add. Example: (link removed) . I'm a little confused what you mean by "the new split says?" The way I see it, I'm removing the dir, then with git subtree add --SQUASH, it puts back the code w/ as two commits: a squash and a merge. Then all the same content is there, but it's as a subtree. git subtree push should find no changes to push. No? – J23 Nov 16 '21 at 01:00

3 Answers3

5

For others who may come across this:

My conclusion, after many many hours of struggling (reading, discussing, retrying in different ways, etc) was that git-subtree just doesn't work properly. Instead, I discovered a much better alternative: git-subrepo. Not only does it work properly, but solves a number of subtree's other shortcomings - most notably the VERY slow pushes.

Thus my "answer" to how to solve this: abandon git-subtree & use git-subrepo instead :)

J23
  • 3,061
  • 6
  • 41
  • 52
1

git subtree is indeed buggy. However, you can use this patch to make it work properly and it will be much faster as well.

antoyo
  • 11,097
  • 7
  • 51
  • 82
0

I struggled with this for a bit (and didn't see another answer on stackoverflow), but realized my issue was that the commit history was getting confused within the subtree and my main repo. So it would try to push up the entire commit history when I just wanted it to grab the subtree. The fix for me was to basically re-setup the external library (in my case called components). This is fine b/c it's pinned to the external library anyway (and my local changes were relatively small)

Here is the general code I used

git rm -r components
rm -rf components
git commit -am "remove components to move to separate repo"
git remote add --fetch components <componit-git-url>
git subtree add --prefix=components/ components master
git commit -am "add your changes back in"
git subtree push --prefix=components/ components my/branch