1

For merging feature branches into master we use git rebase of the feature-branch onto master and then git merge --no-ff*. Now we want to record the changes required/made for merging (applying the commit to the new HEAD) in a separate commit**.

Normally git rebase changes commits from the feature branch so that they cleanly apply to the new HEAD combining both the original changes from the commit with the conflict resolution.

Is it possible to both have a clean linear history, as results from git rebase followed by git merge, and a record (in a commit) of changes introduced by the rebase action?

Some of the rebase/merge problems might be caused by people losing track of what changes to merge: so How to resolve merge conflicts in Git might help in improving the results of people rebasing/merging, but it doesn't help us in recording and reviewing the changes made for the merge.

I think I grasp the concepts of merging and rebasing in git: https://www.atlassian.com/git/tutorials/merging-vs-rebasing and the referred atlassian article (https://www.atlassian.com/git/tutorials/merging-vs-rebasing) do not help.


*: Actually we often use git merge --squash; I prefer answers that use --no-ff instead of --squash. And, --squash will probably squash the carefully separated feature and merge changes. And thus, be not that useful in this context.

**: Context: my colleague has a tendency to create big squashed commits and within include some of their unreviewed code and omit some code (that we do want merged) of other colleagues and thereby break builds; cause bugs; and remove functionality all under the name of 'just' rebasing branches. Therefore we are looking for ways to keep track of what havoc they wreak. I assume the general advise would be smaller/atomic commits, shorter lived branches, better automated testing/continuous integration, and to get rid of said colleague. However, none of these are currently easy to attain in the short term; we strive to improve ourselves on the first three points.

Kasper van den Berg
  • 8,951
  • 4
  • 48
  • 70
  • 1
    The solution to your problem is not technical, it is either: giving your colleague the feedback and support they need to stop causing these problems; or their no longer being your colleague. – jonrsharpe Oct 30 '19 at 22:23

2 Answers2

2

Is it possible to both have a clean linear history, as results from git rebase followed by git merge, and a record (in a commit) of changes introduced by the rebase action?

Not really, no.

What Git has are commits. This translates to—mostly anyway—the fact that all Git has are commits. I'll address the word mostly in a moment.

Each commit has its own unique hash ID. That one hash ID acts as the true name of the commit; all other techniques for finding the commit tend to be means of finding the one true hash ID, then using that to find the commit.

A branch name like master or dev, or a remote-tracking name like origin/master or origin/dev, holds the hash ID of one commit.

Each commit holds the hash ID of some number of immediate predecessor commits: the parents of that commit. Most commits hold one parent hash ID. Merge commits hold two or more, which is what makes them merge commits. At least one commit—the very first one someone ever made in that repository—necessarily has no parents. This commit is a (usually "the") root commit. Git uses the parent hash of the tip commit of a branch to find the previous commit in the branch, then continues on back, one commit at a time, to reach the root: the commits traveled, in this process, are the commits that are on the branch.

Multiple branch names can point to multiple tip commits that all eventually reach the same root. The collection of starting points, plus all the internal connections via parent hashes, make up the commit graph.

Each commit stores a full and complete snapshot of all of the files that make up that commit. (The internals here are not really relevant to your issue; the fact that the commit stores, indirectly, just the one snapshot, is.) It also stores the metadata for that commit: who made it (author and committer, each made up of name + email + date-and-time), the parent hash ID(s), and the log message. And that's it! That's all there is. Git finds changes by comparing adjacent commits. Whatever is different in the two snapshots, that must be what changed.

When you use git rebase, you are really telling Git: copy some commits. The original chain of commits might go like this:

          I--J--K   <-- feature
         /
...--G--H--L   <-- master

Branch feature consists of commit K (the tip commit, found by the name feature, which stores the actual hash ID we're calling K here) preceded by J preceded by I preceded by H, and so on. Branch master consists of commit L, then H, then G and so on, in the same manner.

This kind of structure would require a true merge to re-combine feature into master, so your group is using rebase. Rebase means: Find the commits unique to feature: the ones that cannot be found by starting at the tip of master and working backwards. Then copy most or all of them, with the new copies being produced by turning each commit into changes, and applying those changes to the new starting point. In this case, Git must copy commits I, J, and K so that the new copies come after L.

Each copy step is essentially the same as git cherry-pick, and some git rebase commands really, literally use git cherry-pick to do the copying. You can think of each copy as a cherry-pick. We'll denote the copied commits with a prime-mark, I' for instance:

          I--J--K   <-- feature
         /
...--G--H--L   <-- master
            \
             I'-J'-K'  <-- [new feature, being built]

Now that the rebase is complete, Git simply yanks the name feature off the old tip and makes it point to the new tip:

          I--J--K   [abandoned]
         /
...--G--H--L   <-- master
            \
             I'-J'-K'  <-- feature

When you look at this repository later, not knowing what the original three hash IDs were, you can't find the original three. You might not even know that the original three commits ever existed. If whoever did the copying didn't hand over the originals, and only ever hands over the copies, they'll be all you ever see. And, with the originals no longer find-able, Git will eventually remove them entirely ("garbage-collect" them with git gc). They might as well never have existed. Only if you save the hash IDs somewhere, so that Git itself doesn't GC the originals, can you have and see the originals. And, since Git only stores snapshots, you need the originals to see what happened between the originals and their copies.

Addressing the "mostly" above, and other waffle-y bits

Git does have:

  • Tags (annotated tags): these store data. The tag data you can store here are arbitrary and are up to you. If you could get your users to put the important data into an annotated tag, and attach and push that tag, that would save it. But that requires that they put the important data somewhere.

  • Notes: git notes let users write arbitrary extra data and tie that data loosely to a commit (by the commit's hash ID). If you could get your users to put the important data into notes, and attach those notes to the copied commits, that would save it.

  • Commit messages. We already mentioned these, but they allow users to store arbitrary data. If you could get your users to put the important data into the commit messages of the rebased commits, that would save it.

But all of these techniques require that your users generate the important data, e.g., by diffing the original commits against the copied ones made by rebasing. You could give them a tool to do that, but if they're sloppy enough to introduce bugs in rebasing, they might be sloppy enough not to use the tool—and writing a good tool that would do this in some usable way is pretty hard.

I mentioned above that the rebase copies "most or all" commits. Rebase omits:

  • all merge commits (by default), and
  • any commit that Git thinks is already present upstream.1

At least, it does this by default: -p and the new --rebase-merges options allow one to "copy" merges (by re-performing the merge, really).


1Git does this by comparing git patch-id results in the symmetric difference of the branch-being-rebased and the upstream-to-which-it-is-to-be-rebased. Technically, this means that Git uses the equivalent of git rev-list --left-right HEAD...upstream (note the three-dot syntax) to generate the lists of commits. Then it runs the equivalent of git patch-id on all of them. It then compares the left and right side commits' patch IDs. Commits in the HEAD side, which is the left side here, that match a commit in the right / upstream side, get knocked out. The idea here is to eliminate cherry-pick of a commit that was already cherry-picked into the upstream.


Conclusion

What this really all boils down to is:

  • Don't let users who regularly mis-resolve conflicts rebase branches.

Some accidents will occur no matter what you do: everyone makes "misteaks" after all, and those new to conflict resolution will no doubt have more trouble with it than old hands. So don't be too rigid with this rule ... but in the end, if you cannot trust someone not to wreck original commits when using rebase to make them into new-and-improved replacements that will force everyone to forget the originals, don't have them do that at all.

Git is quite capable of dealing with a tangled history:

          I--J--K
         /       \
...--G--H--L------M--Q--R   <-- master
            \       /
             N--O--P

may be more messy for humans to deal with, but Git will deal with it.

torek
  • 448,244
  • 59
  • 642
  • 775
0

Assume we have:

... A - B - C - E - G  (master)
          \
            D - F - H  (feature)

where F requires manual resolution of merge conflicts.

Normally git checkout feature && git rebase HEAD~3 --onto master would result in:

... A - B - C - E - G  (master)
                      \
                        D' - F" - H'  (feature)

where D' and H' are equivalent to D and H with updated parents. And where F" is either a reasonable merge or a havoc wreaking commit.

Suppose that we can split F" into F' (equivalent to F with an updated parent and a merge commit M which resolves the conflicts cause by applying F onto master; this would result in:

... A - B - C - E - G  (master)
                      \
                        D' - - - M - H'  (feature)
                           \   /
                             F'

Now everyone can see what happened in M. A big disadvantage is that F' doesn't compile.

What git commands to issue, when git rebase pauses after D' to have the developer resolve the merge conflict? (currently under construction: I'll hope to be able to provide essential details when I can experiment with git)

  • (possibly) git reset --hard
    to clear the working directory
    (test whether this doesn't wrack the rebase in progress)
  • git checkout --theirs && git commit
    or git cherry-pick F
    to record F'
  • git reset --hard HEAD~1
    to return to D'
  • git merge F' with resolving merge conflicts. to record M
  • somehow put the working dir in a state so that git rebase can continue
    I believe git rebase expects some staged files -from which it creates the rebased commit- not a clean staging area. Perhaps git rebase --skip works.
    (investigate)
Kasper van den Berg
  • 8,951
  • 4
  • 48
  • 70