You're right that your diagram (from git-scm.com/docs/git-merge) shows a merge that has a common ancestor commit. It's worth noting that this is a shared commit; the term branch is kind of tricky, in Git. (See What exactly do we mean by "branch"?)
Anyway, I think it helps if you forget that git pull
even exists. All git pull
does is run two different Git commands for you. You're better off, until you are well-experienced with Git, using separate git fetch
and git merge
commands. (Note that git pull --rebase
switches the second command to git rebase
, but we won't get into details here.) There are several issues with using git pull
to run the other two commands. One of them is that git pull
uses a weird pull-only syntax, different from all other Git commands, including the git merge
that git pull
runs. That is, instead of git pull origin xyz
, you'll run git merge origin/xyz
. To see what that is, you'd run git log origin/xyz
, or git show origin/xyz
, etc. These are always spelled origin/xyz
, with the slash, except when using git pull
—so don't use git pull
. :-) Let's break it into the two separate commands.
The first command git pull
runs is git fetch
, which you can run any time you like: git fetch
calls up some other Git, asks it what commits it has for you, under what names (typically branch and tag names). It collects those commits (and their files of course), and for each of their branch names, creates or updates your remote-tracking names. So that's where origin/master
comes from, for instance: git fetch
sees that they have a master
, that their master is commit badf00d
or whatever, and creates or updates your origin/master
to remember: origin
's master was badf00d
the last time I checked.
The second command that git pull
runs for you is where all the interesting action is. This second command should not be run at any old time, on any old branch, because whichever second command you have Git run, this one has to be on the right branch: the one you want to merge into, or the one you want to rebase. I find using the separate commands helps here, because it's clearer that git merge
is going to affect the current branch, even though you'll name something like origin/master
.
Now that we know --allow-unrelated-histories
is really an option to git merge
, let's dive into git merge
, and see what it does. First we'll look at what it does with a common starting point, then again at what it does without one.
Merge is about combining changes since a common starting point
Consider the diagram you quoted above, which I'll redraw just a little bit:
A--B--C <-- topic
/
D--E--F--G <-- master (HEAD)
This indicates that whoever has been working on topic
, they started by checking out commit E
. Probably, at that time, commit E
was the last commit on master
:
D--E <-- master, topic
Since then, someone added two commits on master
, which are F
and G
, and someone—probably someone else—added three commits on topic
, which are now the A-B-C
chain (with A
's parent commit being E
).
Each commit represents a complete snapshot of all source files. So commit E
has all the files in it—well, all the files it had when you, or whoever, made commit E
—saved in that form, forever. Any changes you, or whoever, made to any files from that saved state and saved in, say, commit A
, cause those files to be in their new state in A
. Any unchanged files in A
simply exactly-match the files in E
.
For simplicity, we'll assume there are two people acting here, "you" and "they", and you made the changes on master
, eventually resulting in commit G
. They then made A
through C
. So you and they both started with whatever is saved forever in commit E
. You ended up with what you have saved forever in G
. So Git can find out what you changed by a simple git diff
, to compare commit E
to commit G
. Likewise, they ended up at C
, so Git can find out what they changed by a similar simple git diff
, comparing E
vs C
:
git diff --find-renames hash-of-E hash-of-G
: what you changed
git diff --find-renames hash-of-E hash-of-C
: what they changed
Git then checks out the files from commit E
, i.e., what you both started with, combines your changes to those files, and builds a new commit with the combined changes in them. That determines what files/contents go into commit H
:
A--B--C <-- topic
/ \
D--E--F--G---H <-- master (HEAD)
New commit H
's first parent is G
, which was the tip of master
and is the branch you have checked out. Its second parent is C
, the one you told git merge
to merge.
Note that when Git does all this change-combining, it has an easy job on all the files that are exactly the same in both branch-tips, because no matter what was the case in the merge base, the two tips match, so both files are the same and either one works fine. It also has an easy job if you changed file X and they didn't, and where they changed file Y and you didn't, because again, it can just take your or their version of those files. It's only where you both touched the same file, in different ways, that Git has to work hard.
Unrelated histories
Unrelated histories occur when there's no common connection between the two sets of commits:
A--B--C <-- master (HEAD)
J--K--L <-- theirs
Your commits start at C
and work backwards, ending at A
. No commits come before A
: A
has no parents.
Their commits start, in this case, at L
(I skipped a lot of letters to leave room to insert our merge). L
's parent is K
, and K
's parent is J
, but J
has no parents either. So there's no common starting point at all.
If you tell Git to merge these, Git just pretends there is one. The pretend starting point has no files. Git runs:
git diff empty-tree hash-of-C
: what you changed
git diff empty-tree hash-of-L
: what they changed
Of course, what you changed, from this diff, is that you added every file (that's in your commit C
). What they changed is that they added every file (that's in their commit L
).
If the files have different names, they are different files and there is no problem: Git takes yours, or theirs. If they have the same names, but the exact same contents, there's no conflict here either: Git can just take yours (or theirs). The problems occur for all files where yours and theirs have the same name, but different contents. As far as Git is concerned, you whipped yours up from scratch, and so did they, so everything conflicts. You must pick the winning contents or construct a new file from the "everything conflicts" inputs.
Once you have resolved any of these conflicts and run git merge --continue
to make Git finish, Git makes a merge commit as usual:
A--B--C--D <-- master (HEAD)
/
J--K---L <-- theirs
The new commit has two parents, C
and L
, and saves, forever, the snapshot that you built by fixing the conflicts that Git reported, and otherwise whatever files were exactly the same in C
and L
or that were only in C
, or only in L
.
("Forever" is a bit too strong: the saved files last only as long as the commit itself. However, the default is for each commit to live forever. If you make the commit go away, the files do too.)