Let's concentrate on the merge result, but start with a quick skim over this part (I've redrawn the graph a bit):
To get back to my previous (linked) question, we had a tree like this:
A--B--C--D--E--F <-- master
\
G--H <-- feature
And we wanted to move B and C to the new feature branch.
The result should have looked like this (with the tick-marks indicating that the commits you have now are copies, not the originals, so their hash IDs have changed, so everyone who got the originals has to scramble to make sure they use the new copies too). But I'll just assume that it did in fact look like this:
A--D'-E'-F' <-- master
\
B'-C'-G'-H' <-- feature
(note that the only commit not copied-and-switched-to is A
!).
When you now run:
git checkout master
git merge feature
Git will do these things in this order:
- Get the hash ID of the current commit (
git rev-parse HEAD
).
- Get the hash ID of the tip of
feature
(git rev-parse feature
).
- Locate the (single, in this case) merge base of those two commits. The technical definition of the merge base is the Lowest Common Ancestor in the DAG, but loosely speaking, it's just before the two branches diverge, which is simply "commit D'".
- Run what amounts to
git diff D' F'
: diff the merge base with the tip of master
. This is "what we changed on master
since the merge base": a big list of files (and their hash ID versions), along with any computed rename information and the like.
- Run what amounts to
git diff D' H'
: diff the merge base with the tip of feature
. This is "what they changed on feature
", in the same way as in step 4. I use the word "we" for step 4, and "they" here in step 5, because we can use git checkout --ours
and git checkout --theirs
to extract particular files during a merge conflict: --ours
refers to files in commit F'
, i.e., what "we" changed, and --theirs
refers to files in commit H'
.
Attempt to combine the differences to get a single changeset.
If Git is able to do all this combining on its own, it declares victory, applies this single changeset to the base commit D'
, and makes a new commit—let's call this M
for merge—in the usual way (so that master
moves to point to M
), except that M
has two parents:
A--D'-E'-F'-----M <-- master
\ /
B'-C'-G'-H' <-- feature
If the automatic merge fails, however, Git throws up its metaphorical hands and leaves you a mess that you must clean up yourself. We'll go into this in a moment.
Three inputs, one output
Note that there are three inputs to this three-way merge:
- the tree for the merge base
- the tree for the current (
--ours
, HEAD
) tip commit
- the tree for the other (
--theirs
) tip commit
The merge base works here because it is a—in fact, the best—common starting point from which the two commits have diverged. Git is able to go straight for the two branch tips because each commit is a complete snapshot:1 it never has to look at all the intermediate commits, except in terms of the graph so as to find the merge base.
We're also deliberately glossing over a bunch of subtle technical issues, such as pair-breaking and rename-finding (see footnote 1), and things like merge strategies (-s ours
means we don't even look at theirs) and strategy options (-X ours
or -X theirs
). But as long as you are just running git merge feature
and there are few or no renames to worry about, that's not a problem.
But—this is one of the key items—in order to figure out what Git is going to do, you must draw the graph, or otherwise identify the merge base. Once you have the hash ID for the merge base commit, you can (if you want to) git diff
the merge base against the two tip commits and see what Git will do. But if the merge base is not the commit you are expecting it to be, the merge will not do what you expect it to do.
1Compare with Mercurial, where each commit is stored, more or less, as a delta or changeset from its parent commit. You might think, then, that Mercurial must start at the merge base and march forward through each commit along each branch chain. But there are two things to note here: first, Mercurial may well have to start before the merge base, because that too could be a changeset from an earlier commit. Second, suppose that along the chain to either tip, some change is made, then backed out. When Mercurial goes to combine the final changesets to implement the same merge as Git, the commit and its backing-out reversion have no effect on the final result. So in that sense, none of the intermediate commits matter after all! We need them only to reconstruct the two final changesets that are to be combined.
In fact, though, Mercurial doesn't do any of this, because each file in Mercurial is occasionally stored anew, fully intact, so that Mercurial won't have to follow extremely long changeset chains to reconstruct a file. Hence what Mercurial does is effectively the same as what Git does: it just extracts the base commit, and then extracts the two tip commits, and does the two diffs.
There's one big technical difference here, which is that Mercurial does not have to guess about renames: the intermediate commits, which—again just like Git—it must traverse to find the merge base, each record any renames with respect to their parent commit, so Mercurial can be certain what the original name of each file was, and what its new name in either tip may be. Git does not record renames: it simply guesses that if path dir/file.txt
appears in the merge base, but not in one or both tip commits, perhaps dir/file.txt
was renamed in one or both tip commits. If tip commit #1 has other/new.txt
that is not in the merge base, that's a candidate file for a rename.
In some cases, Git can't find renames this way. There are additional control knobs. There is one to break pairings if files have changed "too much", i.e., to have Git say that just because dir/file.txt
is in both base and tip, that it may not actually be the same file. There is another to set the threshold at which Git declares a file to match, for rename-detection purposes. Last, there is a maximum pairing queue size, configurable as diff.renameLimit
and merge.renameLimit
. The default merge pairing queue size is larger than the default diff pairing queue size (currently 400 vs 1000, ever since Git version 1.7.5).
The mess you get if there are conflicts
When Git declares a "merge conflict" it stops in the middle of step 6. It does not make new merge commit M
. Instead, it leaves you a mess, stored in two places:
The work-tree has its best guess at what it could do as an automated merge, plus all the conflicting merges written out with conflict markers. If file.txt
has a conflict—a place where Git was unable to merge "what we did" with "what they did"—it might have a few lines that look like this:
<<<<<<< HEAD
stuff from the HEAD commit
=======
stuff from the other commit (H' in our case)
>>>>>>> feature
If you set merge.conflictStyle
to diff3
(I recommend this setting; see also Should diff3 be default conflictstyle on git?), the above is modified to include what's in the merge base (commit D'
in our case), i.e., what text was there before both "we" and "they" changed it:
<<<<<<< HEAD
stuff from the HEAD commit
||||||| merged common ancestors
this is what was there before the two
changes in our HEAD commit and our other commit
=======
stuff from the other commit (H' in our case)
>>>>>>> feature
Meanwhile, the index—the place where you build the next commit to make—has up to three entries per "slot" for each conflicted file. In this case, for file.txt
, there are three versions of file.txt
, which are numbered:
:1:file.txt
: this is a copy of file.txt
as it appears in the merge base.
:2:file.txt
: this is a copy of file.txt
as it appears in our (HEAD) commit.
:3:file.txt
: this is a copy of file.txt
as it appears in their (tip of feature
) commit.
Now, just because there is a conflict in file.txt
does not mean there were not some other changes that Git was able to resolve on its own. Suppose, for instance, that the merge base version reads:
this is file.txt.
it has a bunch of lines.
we plan to change some of them on one side of the merge.
we plan to change other lines on the other side.
here is something to change without conflict:
la la la, banana fana fo fana
here is something else
to change with conflict:
this is what was there before the two
changes in our HEAD commit and our other commit
and finally,
here is something to change without conflict:
one potato two potato
In HEAD
, let's make the file read this way, using however many commits we like to get to this point:
this is file.txt.
it has a bunch of lines.
we plan to change some of them on one side of the merge.
we plan to change other lines on the other side.
here is something to change without conflict:
a bit from the Name Game
here is something else
to change with conflict:
stuff from our HEAD commit
and finally,
here is something to change without conflict:
one potato two potato
(Note that we made two distinct regions of change. By default git diff
will combine them into a single diff hunk as there's only one context line between them, but git merge
will treat them as separate changes.)
In the other (feature
) branch let's make a different set of changes, so that file.txt
reads:
this is file.txt.
it has a bunch of lines.
we plan to change some of them on one side of the merge.
we plan to change other lines on the other side.
here is something to change without conflict:
la la la, banana fana fo fana
here is something else
to change with conflict:
stuff from the other commit (H' in our case)
and finally,
here is something to change without conflict:
cut potato and deep fry to make delicious chips
Again, we have made two changes, but only one conflicts.
The work-tree version of the merged file will take each change that does not conflict, so that the file will read, in full:
this is file.txt.
it has a bunch of lines.
we plan to change some of them on one side of the merge.
we plan to change other lines on the other side.
here is something to change without conflict:
a bit from the Name Game
here is something else
to change with conflict:
<<<<<<< HEAD
stuff from the HEAD commit
=======
stuff from the other commit (H' in our case)
>>>>>>> feature
and finally,
here is something to change without conflict:
cut potato and deep fry to make delicious chips
It's your job, as the one doing the merge, to resolve the conflict.
You may choose to do this:
git checkout --ours file.txt
or:
git checkout --theirs file.txt
but either of these simply copies the "ours" or "theirs" index version (from slot 2 or 3) to the work-tree. Whichever one you choose, you will lose the changes from the other branch.
You may hand-edit the file, removing the conflict markers and keeping or modifying some or all of the remaining lines to resolve the conflict.
Or, of course, you can use any of your favorite merge tools to handle the conflict.
In all cases, though, whatever is in your work-tree will be your final product. You should then run:
git add file.txt
to wipe out the stage 1, 2, and 3 entries and copy the work-tree version of the file to the normal stage-zero file.txt
. This tells Git that the merge is now resolved for file.txt
.
You must repeat this for all the remaining unmerged files. In some cases (rename/rename conflicts, rename/delete, delete/modify, and so on) there is a bit more work to do, but it all boils down to making sure that the index has only the final stage-zero entries that you want, and no higher-stage entries. (You can use git ls-files --stage
to see all the entries in all their stages, although git status
does a pretty good job of summarizing the interesting ones. In particular, all files that have stage-zero entries that exactly match the HEAD
commit are extremely boring, and git status
skips right over them. If there are hundreds or thousands of such files, that's very helpful.)
Once you have resolved all the files in the index, you run git commit
. This makes merge commit M
. What's in the commit is whatever is in your index, i.e., whatever you git add
-ed to remove higher stage index entries and insert stage-zero entries.
Using git checkout
to check out and resolve at the same time
As noted above, git checkout --ours
or git checkout --theirs
just gets the copy from index slot 2 or 3 and writes it to the work-tree. This does not resolve the index entries: all the slot 1, 2, and 3 unmerged entries are still there. You must git add
the work-tree file back to mark it resolved. As we also noted, this loses any changes from the other tip commit.
If that's what you want, though, there is a short-cut. You can:
git checkout HEAD file.txt
or:
git checkout MERGE_HEAD file.txt
This extracts the version of file.txt
from the HEAD (F'
) or MERGE_HEAD (H'
) commit. In so doing, it writes the contents to stage zero for file.txt
, which wipes out stages 1, 2, and 3. In effect, it gets the --ours
or --theirs
version and git add
s the result, all at once.
Again, this loses any changes from the tip commit.
It's easy to get this wrong
It's very easy to get these resolving steps wrong. In particular, git checkout --ours
and git checkout --theirs
, and their short-cut versions using HEAD
and MERGE_HEAD
, drop the other side's changes to a file. The only indication that you will have of this is that the merge result is missing some changes. As far as Git is concerned, that's the correct result: you wanted those changes dropped; that's why you set the stage-zero index entry that way before you made the merge commit.
It's also easy to get a surprise merge base, particularly if you try to do a lot of git rebase
or git cherry-pick
work to copy commits around and move branch names to point to the new copies. It's always worth carefully studying the commit DAG. Get help from "A DOG": git log --all --decorate --oneline --graph
, all decorate oneline graph; or use gitk
or some other graphical viewer, to visualize the commit graph. (Instead of --all
you might also consider using the two branch names in question, i.e., DOG rather than just any old A DOG: git log --decorate --oneline --graph master feature
. The resulting graph is likely to be simpler and easier to read. However, if you did a lot of rebasing and cherry-picking, --all
may reveal more. You can even combine this with specific reflog names such as feature@5
, though this gets a bit long-winded and makes for quite messy graphs.)