7

Is there a git equivalent to hg rebase -s source -d newparent?

That is, 'prune' a branch at source and 'graft' it at newparent. Or reparent source on newparent (merging where appropriate).

Or how to go for example from this:

A - B - C
 \
  D - E
   \
    F

To this:

A - B - C
     \
      D'- E'
       \
        F'

In this case, source is D, and newparent is B. Doing hg rebase -s D -d B produces the desired result. Is there a git equivalent?

I've tried git rebase --onto B D but it didn't apparently do anything apart from moving branch labels around.

Edited for clarification: The goal is not to reparent a commit in a tree exactly like the above. The above is an example. The goal is to let me reparent a commit on top of any other commit, as long as there are no weird situations like trying to reparent a merge commit or similar. I've created a couple of scripts that recreate the tree above, one for hg:

#!/bin/sh
set -e
rm -rf .hg
hg init
cat > .hg/hgrc <<'EOF'
[ui]
username = Rebase tester <no@email>
[extensions]
rebase =
EOF
echo A > file.txt
hg add file.txt
hg commit -m A_msg
hg bookmark A_bm
hg bookmark dummy # to stop A from tracking us
echo B > file.txt
hg commit -m B_msg
hg bookmark B_bm
hg bookmark graftpoint
hg bookmark -f dummy
echo C > file.txt
hg commit -m C_msg
hg bookmark C_bm
hg checkout A_bm
hg bookmark -f dummy
echo D > file.txt
hg commit -m D_msg
hg bookmark D_bm
hg bookmark prunepoint
hg bookmark -f dummy
echo E > file.txt
hg commit -m E_msg
hg bookmark E_bm
hg checkout D_bm
hg bookmark -f dummy
echo F > file.txt
hg commit -m F_msg
hg bookmark F_bm
hg bookmark -d dummy
hg log -G -T '{desc} {bookmarks} {rev}:{node|short}'
hg rebase -s D_bm -d B_bm -t internal:other
hg log -G -T '{desc} {bookmarks} {rev}:{node|short}'

and one for git:

#!/bin/sh
set -e
rm -rf .git
git init
git config user.name 'Rebase tester'
git config user.email 'no@email'
echo A > file.txt
git add file.txt
git commit -m A_msg
git branch A_br
echo B > file.txt
git commit -a -m B_msg
git branch B_br
git branch graftpoint
echo C > file.txt
git commit -a -m C_msg
git branch C_br
git checkout A_br
git checkout -b D_br
echo D > file.txt
git commit -a -m D_msg
git branch prunepoint
git checkout -b E_br
echo E > file.txt
git commit -a -m E_msg
git checkout D_br
git checkout -b F_br
echo F > file.txt
git commit -a -m F_msg
git log --graph --all --format=format:'%s %d %h%n'

#insert command(s) here

git log --graph --all --format=format:'%s %d %h%n'

The trees output by the first script are:

@  F_msg F_bm 5:5ffe9c283d51
|
| o  E_msg E_bm 4:9f83c609d7b2
|/
o  D_msg D_bm prunepoint 3:c3561e22f394
|
| o  C_msg C_bm 2:e7dd832a739b
| |
| o  B_msg B_bm graftpoint 1:c3d6803dba3e
|/
o  A_msg A_bm 0:f52f4706cef0

and

@  F_msg F_bm 5:efe4fde4dcdf
|
| o  E_msg E_bm 4:b2402cb25f70
|/
o  D_msg D_bm prunepoint 3:5849595efdde
|
| o  C_msg C_bm 2:e7dd832a739b
|/
o  B_msg B_bm graftpoint 1:c3d6803dba3e
|
o  A_msg A_bm 0:f52f4706cef0

exactly as expected. The output of the second script is

* C_msg  (master, C_br) de97063
|  
* B_msg  (B_br) 6053c6b
|    
| * E_msg  (E_br) 13d4fac
| |     
| | * F_msg  (HEAD, F_br) b9ce3c4
| |/  
| |   
| * D_msg  (D_br) ed2ba19
|/  
|  
* A_msg  (A_br) 2cf9476

(twice). The second one should be something like:

* C_msg  (master, C_br) de97063
|    
| * E_msg  (E_br) 1398dc5
| |     
| | * F_msg  (HEAD, F_br) 8ee34ad
| |/  
| |   
| * D_msg  (D_br) ed873f7
|/  
|  
* B_msg  (B_br) 6053c6b
|  
* A_msg  (A_br) 2cf9476

My problem is that hg rebase -s source -d destination works in any situation, but I haven't found a way to do the same with git. I've found a couple third party programs, but they don't seem to address this use case. One is git reparent and the other is git-reparent-branch. I've also found a solution using grafts and filter-branch but it's not apparent to me that that would correctly handle conflicts.

Community
  • 1
  • 1
Pedro Gimeno
  • 2,837
  • 1
  • 25
  • 33
  • BTW I just took a quick look at `git reparent` and `git-reparent-branch`. The former is trivial and does not do what Mercurial does at all; the latter seems more complex but probably does not do what Mercurial does either. This is one of those cases where Mercurial is simply much more capable than Git (although I now wonder what hg does if there is a merge commit in the set that would be grafted, since those are technically impossible to rebase/graft correctly). – torek Oct 30 '16 at 05:51

3 Answers3

10

This can be done in Git, but it's more complicated. To understand why, and therefore get to how, we need to review a key difference between Mercurial and Git.

[Edit, a day or two later: I hate to make this longer, but I think I can summarize the problem in two key points now. It boils down to:

  1. Mercurial allows multiple heads—Mercurial's notion of a tip of a branch is called a head—within a branch. When this situation occurs, Mercurial just deals with it, because it can and must.

  2. Git's design makes it impossible, by definition, to have multiple tip commits—Git's notion of the tip of a branch is called a tip—on the same branch. This means Git can't have the equivalent, does not have to deal with it, and simply doesn't try. But we can do what we might want, using Git's built-in tools; it just gets messy.

The remainder is the detailed explanation, along with a way—one that gets quite clumsy—to do the job with existing Git tools. What's really needed to do what hg rebase does just with one command, is a better Git tool, but as far as I know it does not exist. I wanted it for a while, and started writing it, but then the use case itself went away and I left it as a prototype that did only what I needed at the time.]

Branches vs commits

In Git, a branch (name) is merely a pointer to a single commit:

...--A--B--C--D   <-- branch1
         \
          E--F    <-- branch2

The name branch1 is just a pointer, remembering the raw hash ID of commit D. The name branch2 is also just a pointer, remembering the raw hash ID of commit F.

All commits have their own identity, but commits A and B are on both branches. Commits C and D are reachable only via branch1, commits E and F are reachable only via branch2, and commits A and B are reachable from both names. In Git, that's what it means for a commit to be "on a branch".

In Mercurial, things are very different. A branch (name) is a very solid entity, and we can draw this graph like this instead:

branch1: ...--A--B--C--D
                  \
branch2:           E--F

Here, commits A through D are on branch1. None of them are on branch2. They can never be on branch2. Commits E and F are on branch2 and they are forever stuck on branch2 and will never be on branch1. It's true that we can merge branch2 back into branch1, making commits E and F reachable—but in Mercurial, reachability has no effect on the grouping of commits into branches. Commits are made on a branch, and are forevermore glued to that branch.

This, of course, means that pruning and re-grafting commits in Mercurial makes it obvious that the commits are copied. The new copies are on the other branch: they're clearly different from the originals. The meaning of the phrase "commit X is on branch Y" is permanent and unchanging. A commit's identity depends on its branch.

In Git, however, the meaning of "commit X is on branch Y" is volatile. The commit is on the branch only as long as the branch label, which is a temporary and moveable thing, makes the commit reachable. Commits can be on many branches simultaneously, or even on no branch at all. A commit has identity independent of any branch label.

This enables Mercurial to have "multiple heads" within a branch

What Git calls a tip commit, Mercurial calls a head. Let me redraw your example, but Git-ified:

A--B--C   <-- tip1
 \
  D--E    <-- tip2
   \
    F     <-- tip3

There are three tip commits here, hence three branches, named tip1 through tip3. These identify commits C, E, and F.

Mercurial sticks commits into branches. This allows us to have a fork within a branch. I can't "do" color, but assume the first line A--B--C is in, say, yellow and the remaining lines are in green, denoting which commits are on which branches:

branch1:  A--B--C
           \
branch2:    D--E
  ...        \
branch2:      F

Here, branch2 contains both commits E and F, even though they are different heads (what Git would call "tips"). This situation is impossible in Git, because two different tip commits need two different branch-names to point to them. You can't draw one single arrow, coming from the right, that points to both E and F, which are all in the "green zone" (branch2).

Addendum (per edit): even "intra-branch", Hg has more information than Git

Even if all the commits are on one branch, Mercurial's internals give it a direct ability that Git lacks. We can point to a commit (by hash ID, sequential number, or Mercurial bookmark) and ask for "all descendant heads that are in this branch". Those are all the head commits whose branch is the current branch (and whose sequence number is greater, although that's an optimization) for which the given commit is an ancestor. (Usually we consider a commit its own descendant and ancestor, and we would here too.) This gives us (or hg) a (fast) way to find all "interesting" heads, and hence all the commits to rebase.

Git's commits have no equivalent: it's impossible to tell, in general, which commits are descendants of some commit. Instead, we can only tell which commits are ancestors, by following internal commit IDs backwards (from commits to their parents). The closest we can get to Mercurial's ability is to say "starting from some given branch tip(s) and working backwards, see which branch-tips have this commit as their ancestor; use all those branch-tips." (Of course, --branches would suffice here, but that is something git rebase doesn't do. It's also pretty slow.)

How to get what you want in Git

Because Git does not have multiple heads, and Git's branches are so ephemeral, we must start with our Git-specific drawing with three branches named tip1 through tip3. We can then rebase either tip2 or tip3: the choice is arbitrary.

Just as in Mercurial, rebasing means copying. Let's rebase tip2 to get D' and E'. We start with this, which I've redrawn a bit to leave some more room:

A--B--C    <-- tip1
 \
  \
   \
    D--E   <-- tip2
     \
      F    <-- tip3

Now we run:

$ git checkout tip2 && git rebase tip1

This first gets us on branch tip2, as git status would say, so that our rebase will affect branch-pointer tip2. Then, it instructs Git to find commits that are reachable from the current branch (tip2) but not reachable from the given branch tip1. These are commits D and E. Then, Git should copy these two commits, with the copies placed after the --onto argument.

We didn't give an --onto argument but it defaults to the argument we did give, which is tip1; and tip1 points to commit C. So the copies are placed after C. The last step of the rebase is to abandon the original chain of commits (though ORIG_HEAD and the reflog for tip2 will remember them for a while) and make the current branch, i.e., tip2, point to the final copied commit, i.e., E':

A--B--C        <-- tip1
 \     \
  \     D'-E'  <-- tip2
   \
    D--E       [ORIG_HEAD]
     \
      F        <-- tip3

We're halfway done. Now we hit the hard part: we need to rebase tip3 as well. We want our new copy of F' to come after commit D'. This means we must find the ID of D'.

Finding this ID is a bit tricky. In this case, it's easy enough: it's the parent commit of the new E', and tip2 points to E', so we just need to name the parent of tip2, for which any of these syntaxes work:

tip2^     # equivalent to tip2^1
tip2^1    # the first (and only) parent of the commit found via tip2
tip2~     # equivalent to tip2~1
tip2~1    # the commit found by moving one first-parent step back

(The difference between the ^ and ~ syntax is useful when you are crossing merge commits, which have more than one parent. If we wanted to move more commits back in a longer chain, we could repeat ^ many times, e.g., foo^^^^^, or use the ~ syntax: foo~5. For cases like this, use whichever one you find easier to type.)

The naïve attempt here—which usually works—is simply to run:

$ git checkout tip3 && git rebase tip2^   # I find ^ easier to type

This finds commits reachable from tip3 but not from tip2^. Those are, of course, commit F itself, and—uh oh—commit D. Note that when we start from tip2^ and work backwards, we go from D' to C to B. So this rebase will copy both D and F, rather than just copying commit F. The copies will go after D', which is the commit we identified by writing tip2^.

This seems like a disaster: Won't we get a new copy D''? And sometimes we will, and that is a (small) disaster. But when git rebase is doing its copies, it first checks to see if the commit it's copying—which, remember, is D—has a copy in the list of commits it should skip.

The list of commits it should skip is D' (tip2^) and C (tip2^^ or tip2~2). And, what do you know, D' is a copy of D. As long as git rebase can figure this out, it skips copying D after all. The result is:

A--B--C        <-- tip1
 \     \
  \     D'-E'  <-- tip2
   \     \
    D     F'   <-- tip3
     \
      F        [ORIG_HEAD]

(What happened to E here? The answer is: I'm not drawing in any of the reflog entries. Normally git log skips them, so I am skipping them too. I am only including ORIG_HEAD, the special name that git rebase leaves behind. The old ORIG_HEAD pointed to E, but the new rebase over-wrote it, so now we only see commit F—and even then, only if we use git log --all.)

Now, there are cases when git rebase can't figure out that D got copied to D'. Specifically, these occur if the first rebase—the one that made tip2 point to E'—had a conflict that you had to manually resolve while copying D.

In this case, you need a smarter git rebase command, instead of the naïve version. This is when you need the --onto argument:

$ git checkout tip3 && git rebase --onto tip2^ tip3^

This git rebase takes two parameters:

  • A set of commits to exclude: that's tip3^, i.e., commit D and everything earlier.
  • A place to start copying after: that's tip2^, i.e., commit D'.

This tells git rebase to copy commit F, but not D or anything before D, and place the copies after D'.

It would be nice if we could tell git rebase to do multiple branch-tips, but we can't. It would be nice if we could tell Git to figure out where D and D' are automatically, and here we have a bit more luck, but it's still tricky. A few years ago, I started to write some code along these lines, but I abandoned the effort when I was getting too little gain for too much pain. The cases I really cared about were already being handled by the copy-detection during a naïve-style git rebase.

torek
  • 448,244
  • 59
  • 642
  • 775
  • This is a really thoughtful and well written answer. I started something earlier but in hindsight I'm glad I didn't finish it :). – larsks Oct 30 '16 at 02:00
  • Thanks for your elaborated answer. I've edited my question to add an `sh` script that handles hg bookmarks (akin to git branches) instead of hg branches, which kind of rebuts your argument about being easier for hg because of its branch model. Your proposed solution has two problems. One, it's not immediately clear how to apply it to generic cases; two, it requires rebasing on a different commit first, and that can cause further conflicts, to a degree which can become unmanageable. It still contains helpful information, and I'm upvoting it for that reason, but I'm not accepting it. – Pedro Gimeno Oct 30 '16 at 02:03
  • Hm, I see your `hg rebase` uses `-t other`, which is (strictly speaking) completely missing in Git (it would be `-s theirs` which does not exist, but would be nice for rebase). We *still* can't do this in Git, though, as `hg rebase`'s `-s` is, in this case, using the internal sequential numbering of commits to select the commits to rebase. Git's commits are simply not numbered this way: you must still name all the tip commits, as well as the cut-off point. (And then we need a multirebase script, like the one I started on but never finished, to really make it work right.) – torek Oct 30 '16 at 02:19
  • The `-t other` is not really important for this. It was an automated way to resolve the conflict, so that the script wasn't interrupted with a prompt. It's OK if the conflict is resolved by hand and that's what I expect. In my first version I was going to say to resolve the conflict by picking a certain commit, then I remembered there was that possibility. About your other comment, it's still a DAG, so there's still an order. But, well, it's also good to know when something can't be done. Thanks. – Pedro Gimeno Oct 30 '16 at 02:27
  • Yes, I figured as much re `-t`, I just wanted to mention that it would be nice if Git had it. (It has `-s ours`; `-s theirs` would be useful in rebase.) For Git, using `--all` or `--branches=` would be a good way to name all the branch-tips of interest: branches that do not contain the specified cut-off commit would simply be excluded from the multi-rebase. These two also have the advantage that `git rev-list` already understands them, so it only takes a lot of code to make it work. :-) – torek Oct 30 '16 at 02:32
  • Well, git has `git checkout --theirs/--ours` which isn't too different and could probably be used in the above script. – Pedro Gimeno Oct 30 '16 at 02:37
  • Unfortunately those can only be used manually, during conflicts. `-s theirs` would disregard entire *files*, as `-s ours` (and hg's `-t local` or `-t other`) does. Git does have `-X theirs` which is equivalent to hg's `-t merge-other`. BTW, if you wanted to work on a multirebase, I have some ideas for how to achieve it. The tricky part is recording intermediate states, for which we must choose a reference name-space. – torek Oct 30 '16 at 02:39
0

Judging by torek's answer, there's, as of this writing, no automated way to do it. Therefore, the next-best way is probably to reproduce the tree manually. I say "tree" because I don't know what will happen if there are merge commits involved.

So, assuming the commit to prune forms an independent subtree, the following should work:

  • Find all downstream tips of the commit to prune (e.g. with gitk --all).
  • Check out the grafting point.
  • Run git rebase --onto HEAD HEAD tip1.
  • Repeat for each of the splitting points (branching points, i.e. commits that have two or more children, possibly including the original splitting point) in the subtree.
  • Fix the references that are not tips, if any.

This does it in the case of the example:

git checkout graftpoint
set +e
git rebase --onto HEAD HEAD E_br
  # Known conflict
  echo D > file.txt
  git add file.txt
  git rebase --continue
set -e
# Repeat for the branching point at E_br~1
git checkout E_br~1
git branch -f prunepoint # fix this reference while we're here
git branch -f D_br # fix this reference while we're here
git rebase --onto HEAD HEAD F_br

This produces the desired graph:

* C_msg  (master, C_br) 612ba5c
|    
| * E_msg  (E_br) 1b4fb2e
| |     
| | * F_msg  (HEAD, F_br) 78d5a1f
| |/  
| |   
| * D_msg  (prunepoint, D_br) c4975c0
|/  
|  
* B_msg  (graftpoint, B_br) cc630ea
|  
* A_msg  (A_br) 01084ac

(the hashes for A_br, B_br and C_br are of course the same as in the tree before the operation, which are different from those in the question because the tree was re-created).

Pedro Gimeno
  • 2,837
  • 1
  • 25
  • 33
0

Why should it be so complicated?

git rebase --onto B A D
git rebase --onto D D@{1} E   # or ... "--onto D{,@{1}} E" in bash
git rebase --onto D D@{1} F   # or ... "--onto D{,@{1}} F" in bash

should do the work (modulo resolving any merge conflicts).

I do this thing fairly often that I have written a script that does this automatically. It's a pity that Git doesn't provide its users with such a tool but makes them write their own, but there's nothing inherently wrong in the concepts in Git that prevents this; the concepts in Git are simply different from those in Mercurial.

musiphil
  • 3,837
  • 2
  • 20
  • 26