1

I'm trying to piece together the appropriate git command for "Take branch_c, find all the commits where it varies from branch_b, base it on branch_a, and resolve any conflicts based on the existing changes from branch_b to branch_c."

The scenario

  • You and a colleague are both working on features slated to be reviewed and merged into master or some other upstream branch. (we'll say branch_a).

  • A colleague of your is working on a feature set to be introduced soon on branch_b. You are continually reviewing it as they go.

  • Contemporaneously, you're working on a feature slated to be introduced down the road, based on branch_b. You work toward this feature on branch_c, and continually rebase and tune it over branch_b.

  • Sometimes, there's a conflict between your feature branch (branch_c) and your buddy's branch (branch_b). You resolve them as they come in an keep working.

  • Your friend finishes her work and asks for review from the whole team. For any number of reasons, some commits are removed. Maybe there's a small squash for clarity, or maybe the team decides to yank some changes and introduce them in a later branch.

  • The changes are made and your colleague's branch is merged into branch_a.

  • Now, you want to rebase over the new history, but when you do, you suddenly need to re-resolve all of the conflicts you've ever resolved.

Typically in this situation, I just end up remembering and repeating each conflict resolution. But surely there's a better way?

I expected something like git rebase --strategy-option=ours <branch_c> --onto <branch_a> <divergent_ref> (edited to correct the command argument per the conversation below) to work - to fix the bases up while resolving conflicts based on the previous ("ours") history, but this doesn't do it - it appears instead to replay the history as if you had always ignored the other branch. (edit: I misinterpreted the result of this command - in fact it does exactly what I have described and produces exactly the history that I think is most desirable in this scenario.)

I also see this question and its top answer which is a similar situation, but not an exact fit because I want to examine the existing history and replay each resolution in turn.

Why not just use git merge?

(note: this section was added following the brief discussion in the comments below.)

My goal is to faithfully depict a linear progression to future readers of the history, whether in git log, git blame, git bisect, or any other way. I don't want future users to annotate one of these changed lines and be taken to a commit that appears to be based on the state of branch_c at some arbitrary time before branch_b was merged. Instead of an interspersed history, I want future readers to be able to cleanly annotate these changes back to the moment branch_b was merged.

Don't get me wrong: git merge, and merge commits, are wonderful tools. I think you'll find that the repository which sparked this question has an immaculate, easy-to-read history and uses merge commits judiciously, intentionally, and in all the right places. What I don't want are aribtrary merge commits to appear in the history at all the moments where one author got around to catching up with another author's work - that does not help readers and is not, in my opinion, an appropriate use of git merge (except as a tool for fast-forwarding obviously).

Rather than debate the uses of merge vs rebase, I'd like to keep this question specifically aimed at the appropriate command to create the history I'm describing.

It will surprise me if this is not already an available feature of git, but I just can't figure out the right combination of things to make it happen.

jMyles
  • 11,772
  • 6
  • 42
  • 56
  • 1
    Why rebase over and over instead of merge ? By merging you will solve only once the conflict, or just wait for the complete feature before rebasing. – Ôrel Feb 08 '20 at 01:32
  • I don't understand what you mean - merge when? At arbitrary moments, like each morning or something? In that case, the history will be polluted with untruthful merge commits and will cause future uses of `git blame` or `git bisect` to be useless for the conflicted lines. On the other hand, waiting until the feature is complete to begin working over it is unrealistic in many cases. – jMyles Feb 08 '20 at 01:34
  • Merging is the way to go. It effectively resolves the issues between branches as you go, without having to constantly rewrite history. – Mad Physicist Feb 08 '20 at 01:37
  • @jMyles. I don't think merging works the way you think it does. It doesn't mess with blame. It lets you decide how to handle conflicts, and remembers the resolution for you in a merge commit. – Mad Physicist Feb 08 '20 at 01:39
  • @Mad Physicist: Thanks for the suggestion. However, I don't agree that merging is the way to go here. I expressly want to rebase; this is a quintessential use case for rebase. And a merge in the situation I'm describing will certainly affect the way an annotated file appears later when using blame or bisect - I want the history to reflect that the line in question was changed subsequent to the history of my colleague's branch, not interspersed with it. And then also, the history will be polluted with merge commits that don't belong. – jMyles Feb 08 '20 at 01:42
  • Merging - and merge commits - are wonderful. But I'm not trying to merge two histories here, I'm trying to faithfully depict a linear progression to future readers of the history, whether in git log, git blame, git biset, or any other way. – jMyles Feb 08 '20 at 01:46
  • Rather than debate the uses of merge vs rebase, I'd like to keep this question specifically aimed at the appropriate command to create the history I'm describing. – jMyles Feb 08 '20 at 01:47
  • Ok, that makes sense. I'm going to step back from pushing my opinion on you, read your question carefully, and figure out how to do it. I'm genuinely sorry for getting carried away here. – Mad Physicist Feb 08 '20 at 01:47
  • I don't mind debating the merits of merge vs rebase to create a readable history in the scenario I'm describing, but I think we might want to seek a different forum in which to do that. – jMyles Feb 08 '20 at 01:50
  • Why rebasing and not waiting for the end of the other feature branch ? Rebasing is hiding the reality of what you have done, day after day you mix up the two dev, if you want to show two separated dev, don't rebase or merge. Git bisect handles very well merge. – Ôrel Feb 08 '20 at 02:34
  • It's a high-velocity project; I can't afford to wait until the end of the other feature branch to begin working on top of it. I respect your point, "rebasing is hiding the reality of what you have done" - this is a common and valid argument. But in this case, I think that *merge* serves to obscure what actually happened much more than rebase. The histories were never based separately in parallel on branch_a, but that's what merge will reflect. And yes, of course git bisect handles merge, but it will present readers with an incorrect history instead of one based on branch_b. – jMyles Feb 08 '20 at 02:36
  • `git rebase --ours --onto ` rebase doesn't take `--ours`. What did you try? – Schwern Feb 08 '20 at 03:22
  • In addition, what do you mean by "*it appears instead to replay the history as if you had always ignored the other branch*"? – Schwern Feb 08 '20 at 03:29
  • I’m having trouble following the description of what’s going on. A colleague a buddy a friend another colleague. Can’t you use names and diagrams to map this? – matt Feb 08 '20 at 09:37
  • It's all the same person, my apologies. – jMyles Feb 08 '20 at 09:44

2 Answers2

2

(FWIW I agree with you about wanting to avoid "update" merges, updating branches via rebase leaves a much cleaner history and avoids many problems. But it does require a bit of understanding of Git geometry.)

Let's use master, feature and feature1 instead of branch_a, branch_b, and branch_c.

Before feature is reviewed and merged into master you have something like this. You're working on feature1 in anticipation of feature being merged. I'd refer to feature1 as an "anticipation branch".

A - B [master]
     \
      C - D - E [origin/feature]
               \
                F - G [feature1]

You can keep up to date easily with git rebase origin/feature.

feature is merged into master with alterations. Let's say D is removed and E is altered. feature is complete so it is deleted.

A - B ------ M [master]
     \      /
      C - E1
       \
        D - E - F - G [feature1]

git rebase master will resurrect the old D and E. Git has no way of knowing that D and E were once part of feature.

A - B ------ M [master]
     \      / \
      C - E1   D - E - F - G [feature1]

You need to use --onto. You want to rebase from the old tip of feature which is E. git rebase --onto master E feature1

A - B ------ M [master]
     \      / \
      C - E1   F - G [feature1]

You may also be able to get the old position of origin/feature from the reflog. git rebase --onto master origin/feature@{1} feature1.

See Recovering From Upstream Rebase in the git-rebase docs, specifically the hard case.

I expected something like git rebase --ours --onto to work - to fix the bases up while resolving conflicts based on the previous ("ours") history, but this doesn't do it - it appears instead to replay the history as if you had always ignored the other branch.

rebase doesn't take --ours. I can't say what went wrong without knowing what you did.

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • OK, thanks for stepping in. So, I was already using `--onto`, as you can see from the command I attempted. This is what I typically do in this situation. But again, this (reasonably) replays all of the conflicts which I have already resolved in the old history. So, how can I apply those resolutions which are in the history that I'm rebasing? – jMyles Feb 08 '20 at 09:38
  • Haha, right - I meant `--strategy-option=ours`. And by "it appears instead to replay the history as if you had always ignored the other branch" what I mean is that it runs uses the "ours" strategy to resolve recursive merges between `branch` and `newbase`. What I want is instead to resolve the merges using the resolved hunks in the existing history. That's where the resolutions are. – jMyles Feb 08 '20 at 09:38
  • Oh don't I feel sheepish. So, after my `git rebase --strategy-option=ours ...`, there were broken tests which looked initially to my eyes to be VCS artifacts. But. They weren't. I was mistaken - they were actual divergences. So the answer to my question is indeed simply `git rebase --strategy-option=ours feature --onto master ` - this bases feature on master and applies the conflict resolutions from the history of feature over divergent_ref. If you want to add this to the top of your answer, I'm happy to give you the green check. Thanks for jogging my mind. – jMyles Feb 08 '20 at 12:45
  • @jMyles That seems backwards. What is `feature` and what is `divergent_ref`? I'm also wary of a blanket `ours` merge strategy as it does not deal with conflicts in a nuanced way. – Schwern Feb 08 '20 at 16:42
1

There are those who argue that you should keep all the merges: this represents the actual development process. The method you're using, of "cleaning up" the set of commits so that the history is prettier and reflects an idealized development process, is supposed to make things easier for future people (perhaps even for future you) when you need to fix bugs that are in those future-people's past, which could be your own future, at the moment.

All of this is a fair amount of work, and the payoff is uncertain. And yet I myself prefer this "pre-clean the history" method, when I can afford it.

Now, let's add this as well:

git rebase --strategy-option=ours feature --onto master <divergent_ref>

You must be very careful with -X ours: Git is following simple text rules, which won't work in all situations. (If you have automated tests, consider adding --exec to run them.)

Typically in this situation, I just end up remembering and repeating each conflict resolution. But surely there's a better way?

You can use git rerere here, sometimes. Mostly --onto is what you need (see Schwern's answer).

There are a few things to know before you start on this path:

  • First, remember that git rebase essentially means copy some commits, as if by git cherry-pick. Some rebase operations really do use git cherry-pick. One—the oldest form of rebase—uses git format-patch and git am. This does not work as well in general, but does run faster. Because it doesn't use git cherry-pick, I think git rerere won't apply either. But adding --strategy-option forces the cherry-pick variant; so does adding -m, or -i or --exec.

  • A cherry-pick is a merge!

  • Having listed out some commit hash IDs, rebase begins its real work by detaching HEAD at the commit onto which the copies are to be appended. It then makes the copies. After the selected commits have been copied, the rebase operation writes the hash ID of the last-copied, or HEAD, commit, into the branch name of the branch you were on when you started the whole thing.

  • But we also have to look at the set of commits that are to be copied. The first sentence above says having listed out some commit hash IDs. Which hash IDs? This is where --onto comes in.

A plain git rebase gives you one control knob, which the git rebase documentation calls upstream. This one control knob both selects commits to copy and where to put the copies. Using git rebase --ontogives you a separate knob for where to put the copies, so that frees up the upstream argument to let you select what-to-copy more carefully.

The upstream argument is normally both where to put the copies and what not to copy, with the what to copy list being determined by the result of git rev-list upstream..HEAD, more or less. We'll see this in action below. But more or less is important here: the rebase documentation cites this A..B notation as how it determines what to copy and what not to copy, but it's actually less than that. Again, we'll see more about this below.

How to express what I think your actual problem here is

In any case, let's draw what I think is your actual problem-over-time as several different snapshots in time:

                   E--F--G   <-- feature1
                  /
...--o--*--o--o--*--o--o   <-- mainline
         \
          A--B--C--D   <-- feature2

This is how it all starts: there's some project with some main-line going on, and two features being developed. But now it turns out that feature2 depends on something in feature1. So now you'd like to rebase—i.e., copy some commits from—feature2 so that they appear in feature1. At this point, this rebase is easy to invoke:

git checkout feature2; git rebase feature1

already selects the correct commits and the correct copy-point. If this copying is done via cherry-pick, each copy is done by the merge machinery, which includes saving conflict resolutions with git rerere, if rerere.enabled is set. (If it's not set, Git saves neither the conflicts, nor their resolutions.)

The end result is:

                           A'-B'-C'-D'  <-- feature2
                          /
                   E--F--G   <-- feature1
                  /
...--o--*--o--o--*--o--o   <-- mainline
         \
          A--B--C--D   [abandoned]

The new commit chain, A'-B'-C'-D', looks a lot like your original chain, but the hash IDs differ. Since no one actually looks at the hash IDs, and the original A-B-C-D chain is now invisible to normal git log operations, nobody ever really pays attention to the changeover—but it's real. And it is about to bite you in another way.

Now that you have rebased your feature2 atop feature1, someone else (or maybe even you) rebases feature1. The result is:

                           A'-B'-C'-D'  <-- feature2
                          /
                   E--F--G
                  /
...--o--*--o--o--*--o--o   <-- mainline
                        \
                         E'-F'-G'   <-- feature1

(I stopped drawing in A-B-C-D as they're not really useful any more.) Note how E-F-G are supposed to be abandoned, but actually are not.

A plain git checkout feature2; git rebase feature1 will choose to copy commits E-F-G-A'-B'-C'-D'. Using --onto, you can run git rebase --onto feature1 hash-of-commit-Gto tellgit rebase*do not copy commits E-F-G.

But in fact, there's a handy feature inside git rebase: it automatically excludes some commits from its list of commits. I already mentioned this above, in the more or less ... actually less part. The rebase documentation actually says this:

[The commits to be copied are] the same set of commits that would be shown by git log <upstream>..HEAD; or by git log 'fork_point'..HEAD, if --fork-point is active (see the description on --fork-point below); or by git log HEAD, if the --root option is specified.

But this is not true! By default, git rebase also excludes:

  • any merge commit, and
  • any commit in upstream..HEAD for which its git patch-id matches the patch-ID of a commit that is in HEAD..upstream.

Suppose that whoever copied E-F-G to E'-F'-G' didn't have to resolve any merge conflicts or anything. In this case, the copies, E' through G', will have the same git patch-id as their corresponding original. So git rebase would drop those commits even without --onto.

(The documentation also mentions fork-point, which uses your own Git's reflog for the upstream branch to pick a --onto value if you didn't. However, fork-point mode is (1) not always active and (2) not always right when it is active. I don't really like the fork-point selection trick myself: I think it buries too much magic. Also, since it uses reflogs, it fails if the crucial reflog entry has expired. But that's all an aside anyway.)

Where this particular sequence all goes wrong is when whoever was copying feature1 had to modify one of their copied commits such that the patch-ID trick fails. In this case, using --onto is the way to go: it fixes the problem without any additional mess.

But you may have another problem. In particular, suppose that while you work on feature2 and someone else works on feature1, they realize the same thing you did, or hear about or see one of your commits' changes, and they add a new commit that partly, but not completely, fixes something you're doing, but in a different way? Then maybe they have:

                           A'-B'-C'-D'  <-- feature2
                          /
                   E--F--G
                  /
...--o--*--o--o--*--o--o   <-- mainline
                        \
                         H-E'-F'-G'   <-- feature1

where H is more similar to one of your original A-B-C-D commits than your updated A'-B'-C' commits. In this case, you might want to bring your A-B-C-D series back. Commit D is almost certainly still in your Git, remembered as feature2@{number}. (The exact number depends on how many updates you have made to feature since then.) Or, of course, you can do what I do, which is to save the original feature2 pointer by creating feature2.0, feature2.1, and so on. Let's draw it back in, as feature2.0, and rename feature to feature2.1:

                           A'-B'-C'-D'  <-- feature2.1
                          /
                   E--F--G
                  /
...--o--*--o--o--*--o--o   <-- mainline
         \              \
          \              H-E'-F'-G'   <-- feature1
           \
            A--B--C--D   <-- feature2.0

If H is very close to one of your original four commits—so that it either has the same patch ID, or that you can use drop in an interactive rebase—you might want to use it as the source. If you had to resolve some conflicts earlier, git rerere will do that. We can now do:

git checkout -b feature2 feature2.0
git rebase -i feature1

(the interactive rebase allows doing a "drop", and forces cherry-picking; if you like, use git rebase -m to force cherry-picking without interactivity). If all goes well, and assuming H==C as it were, we end up with:

                           A'-B'-C'-D'  <-- feature2.1
                          /
                   E--F--G
                  /
...--o--*--o--o--*--o--o  mai...   A"-B"-D"   <-- feature2
         \              \         /
          \              H-E'-F'-G'   <-- feature1
           \
            A--B--C--D   <-- feature2.0

The rerere was perhaps useful in terms of saving a merge conflict resolution for D, and the auto-patch-ID-detection may have ejected commit C for you here.

Viewing rebase as a series of cherry-picks

A normal merge works by finding a merge base—a common, shared commit between two branches—and doing two diffs. The diff from merge base to each branch tip tells us who changed what:

          I--J   <-- ours (HEAD)
         /
...--G--H
         \
          K--L   <-- theirs

The comparison of commit H's snapshot vs J's tells us what we changed, on branch ours; comparing H vs L tells us what they changed; and git merge combines the changes. In cases of conflicts, the (eXtended) strategy-option -X ours or -X theirs tells Git to resolve the conflict automatically by choosing H-vs-J ("ours") or H-vs-L ("theirs").

For a normal merge, the final commit after resolving everything is a merge commit, with parents J and L, in that order (ours first, then theirs).

A cherry-pick takes a normal merge and subverts it. Instead of finding a common commit, the merge base is simply the to-be-picked commit's parent commit.

When we're rebasing and copying that first commit, this makes sense:

...--o--o--*--o--H   <-- mainline, HEAD (detached)
            \
             A--B--C   <-- feature

We are now copying commit A. Here "ours" is H: the tip commit of mainline, to which HEAD points directly (detached). The pseudo merge base is commit *: the point at which A first diverged. So we'll diff * vs H to see what "we" changed, and * vs A to see what "they" changed. Then we'll combine these differences: that gets us H back, plus whatever we did in A. We'll commit the result and make A', the copy of A:

                   A'  <-- HEAD
                  /
...--o--o--*--o--H   <-- mainline
            \
             A--B--C   <-- feature

(The final commit of a cherry-pick is an ordinary, non-merge commit.)

But now we'll copy B. Its parent is A, so we will use A for the merge base. We'll diff A vs A' to see what "we" changed, and A vs B to see what "they" changed. If we blindly take "ours"—i.e., A vs A'—in a conflict, we could lose important changes from B. Maybe we don't need them—maybe *-vs-H already contained them. But maybe we do.

In any case, when this is all done, we end up with:

                   A'-B'  <-- HEAD
                  /
...--o--o--*--o--H   <-- mainline
            \
             A--B--C   <-- feature

and we are ready to cherry-pick C as before. When that's done, Git will yank the name feature off C and make it point to C' instead (and re-attach HEAD):

                   A'-B'-C'  <-- feature (HEAD)
                  /
...--o--o--*--o--H   <-- mainline
            \
             A--B--C   [abandoned]

and that's our rebase.

Since all this is theoretical anyway, let's just draw conclusions now

The real points to remember here are:

  • rebase copies (some) commits, as if by, or actually by, git cherry-pick;
  • the real key to minimizing work is to pick the correct commits to copy;
  • sometimes this means --onto, and sometimes it might even mean going back to an earlier copy of your own work.

(I don't actually use git rerere myself and am not sure whether cherry-pick uses it automatically. If not, you can use it manually.)

torek
  • 448,244
  • 59
  • 642
  • 775