5

(Similar to this question, but with some context and demonstration of why rerere is not an answer.)

For the given history:

                /...o      origin/master
o...o...o...o...o...o...o  master
    \...o........../       topic

I have a topic branch which I've merged into master, and made one additional commit. Meanwhile, someone upstream has made another commit on origin/master, so I can no longer push my master as-is.

I want to rebase my master onto origin/master without altering the commit SHA on topic and without losing the conflict resolution already performed on master. (This is by far my most common case of wanting to preserve merge commits, so I'm surprised that this is apparently so difficult.)

With rerere enabled, git rebase -p almost works -- for any conflicts in the original merge, it remembers what I did to fix them and reapplies this (although it leaves the file marked as conflicted, so I have to remember to mark each one as already resolved without restarting conflict resolution on the file, which is mildly annoying from the TortoiseGit front-end). But if there were any other changes to files that were also fixed in the merge commit (eg. lines purely added in the merge without conflicts, but still needed to be corrected due to changes elsewhere), these are lost.

Here's the thing though. In my (perhaps flawed) understanding of merge commits, they consist of two (or more) parents and a unique changeset (used to store the conflict resolutions, plus any other changes made before committing the merge or later amended to the merge commit). It appears that rebase -p re-creates the merge commit but completely discards this extra changeset.

Why doesn't it reapply the changeset from the original merge commit? That would make rerere redundant and avoid losing these additional changes. It could leave the affected files marked as conflicts if it wanted human confirmation, but in many cases this automatic resolution would be entirely sufficient.

To put it another way, to label some of the commits above:

                /...N      origin/master
o...o...o...o...B...M...A  master
    \...T........../       topic

T - the commit on topic
B - the merge-base of origin/master and master
N - the new commit on origin/master
M - the merge between B and T
A - the extra post-merge commit

M has parents B and T and a unique changeset Mc. When creating M', git performs a new merge between parents N and T, and discards Mc. Why can't git just reapply Mc instead of discarding it?

In the end, I want the history to look like this:

o...o...o...o...B...N...M'...A'  master
    \...T............../

Where M' and A' change SHA1 from the rebase, but M' includes the Mc changeset and T didn't change SHA1 or parent. And now I can fast-forward origin/master to A'.


I have also noticed that there's a new option --rebase-merges which sounded nice at first and does result in the right graph afterwards -- but just like --preserve-merges still stops with conflicts on M' and loses any unique changes in Mc not otherwise saved by rerere.


An alternate formulation of the question which might be more useful:

Given the initial state above, and having just started an interactive rebase that is now in either HEAD1 or HEAD2 states:

        /...........(T)
       /               \
      /             /...M'  HEAD2
     /              /...    HEAD1
    /           /...N       origin/master
o...o...o...o...B...M...A   master
    \...T........../        topic

(HEAD1 has checked out N but done nothing else yet; HEAD2 has created a new merge with N as parent 1 and T as parent 2 but hasn't committed yet due to unsolved conflicts)

Is there some sequence of rebase commands and/or git commands which will:

  1. Calculate the diff Mc between M and B (choosing B because the other parent T is not changing)
  2. Apply this to the conflicted tree M' (which should completely resolve all conflicts, unless N introduces new ones) OR Simply apply this on top of N (without first doing any merge) -- these should be equivalent; the second might be easier
  3. Pause for a human to resolve any remaining conflicts introduced by N, if any.
  4. Commit M' as a merge between N and T
  5. Continue as usual (in this case rebasing A to A' on top of M')

And why doesn't git do this by default?

Miral
  • 12,637
  • 4
  • 53
  • 93
  • Obviously, N can introduce new conflicts which still have to be resolved by a human. But any conflicts between B and T or any additional changes made in M should _just work_ when applied to M'. And if N does not introduce new conflicts (which is probably more common than not), then this can be entirely automatic. – Miral Jan 25 '19 at 05:28
  • `I want to rebase my master onto origin/master without altering the commit SHA on topic` About this point, I'm not sure I understand what you mean. *If* what you mean is that you want to do a rebase of a branch on top of another and, after the rebase, get the same SHA for the final revision of the rebase as it was when you started, that's not going to happen. A different revision will always have a different SHA. In order for 2 revisions to have the same SHA id, then their `content` (tree object, author, author time, committer, committer time, comment and parents) have to be the same. – eftshift0 Jan 25 '19 at 06:15
  • @eftshift0 Read further; what I want is shown in the graphs. I want to rebase `master`, which will of course produce new SHAs of commits on master, including the merge commit itself. I do not want to alter the SHA of T (but there should be no need to, this is simply the second parent of the merge commit actually being rebased). – Miral Jan 25 '19 at 06:20

1 Answers1

4

The fundamental reason that git rerere cannot record the non-conflicts is that git rerere is implemented in a cheap and dirty manner: Git takes each initial conflict, strips it of some data to make it more applicable (the same way that git patch-id strips line numbers and some white-space), and then saves the conflict as a blob object in the database, obtaining a hash ID that it stores in the rerere directory. Later, when you git commit the result, Git pairs that one specific conflicted-changes blob with its resolution. So it only "knows" the conflicts, not any other changes.

The later merge (with its conflicts) tries saving the conflicts again, gets a hash ID again, and finds the pairing, so it uses the saved second blob as the resolution. Since the non-conflicted changes aren't saved here, they never show up as part of this process.

Git could perhaps save more, but it doesn't.

In my (perhaps flawed) understanding of merge commits, they consist of two (or more) parents and a unique changeset (used to store the conflict resolutions, plus any other changes made before committing the merge or later amended to the merge commit).

This is incorrect. All commits are just snapshots of state. Merges are not special here—just like non-merge commits, they have a complete source tree. What is special about them is that they have two (or more) parents.

Copying a non-merge, as git cherry-pick does (and git rebase does repeatedly by repeatedly invoking git cherry-pick, or doing something not quite as good, but similar), works by using the commit's (one and only) parent as the merge base for the merge-as-a-verb operation. Copying a merge is not possible in general, and rebase doesn't try: it just re-performs the merge.

(On the other hand, git cherry-pick will let you cherry-pick a merge, using its -m option to select one particular parent. Git simply pretends that that is the lone parent for the duration of the three-way merge operation. In theory, the rebase code could do the same: -m 1 is almost always the correct parent, and one can always use the low-level git commit-tree to make the actual commit, so as to make it a merge commit. But git rebase does not do this.)

... if there were any other changes to files that were also fixed in the merge commit (eg. lines purely added in the merge without conflicts, but still needed to be corrected due to changes elsewhere), these are lost.

Yes (for the reason discussed above). That is perhaps one reason people refer to such things as an "evil merge" (though another possible reason for the phrase is that such changes were, at least to all evidence one has available in the future, not actually requested by anyone). While it does not help your goal with existing merges, I would advise not making such changes: instead, make those changes before or after the merge, in an ordinary non-merge commit that feeds into or out of the merge, so that a later rebase -p or rebase --rebase-merges can preserve them.

torek
  • 448,244
  • 59
  • 642
  • 775
  • The first part is a nitpick; a snapshot of state which is unique from both of its parents is equivalent to a unique changeset. It is entirely possible for git to calculate this and reapply it during the course of rebasing the merge commit. – Miral Jan 25 '19 at 06:10
  • As for the second part, avoiding "evil merges" is not possible; this either requires changing the SHA1 of T (which breaks "which branches is this commit on") or including a non-compiling commit in history (which breaks "git bisect"). – Miral Jan 25 '19 at 06:10
  • Yes, but it's an important nit: using `-m 1`, as with `git cherry-pick` or `git revert` of a merge, misfires a bit in the case where the *same* change appears across *both* parents. I agree that avoiding evil merges has its own problems. I should also have mentioned why rerere does not record the evil-merge changes, though; let me insert that. – torek Jan 25 '19 at 06:12
  • By the way, you can tell `git bisect` to *skip* some commits. Again, this is hardly a good situation, but you could insert, in the commit text for "fix something that merge did not detect" commits, a flag. Then, during bisect, before marking a commit bad, check to see if it has an adjacent marked commit, and instead of `git bisect bad`, use `git bisect skip`. Automating this is clearly both possible and, unfortunately, somewhat painful. – torek Jan 25 '19 at 06:22
  • Yes, I know, and I've done that in the past, but I'm trying to avoid that sort of thing and get it all to Just Work™. It seems tantalisingly close and I don't know why git seems to prefer discarding information rather than making use of it intelligently. Isn't the point of a source control system to avoid loss of work? (Granted thanks to the reflog it's not completely lost, but still.) – Miral Jan 25 '19 at 06:23