Your question reveals a lot of confusing about what a Git branch is and how merges work in Git. (This is not all that surprising since a lot of introductions to Git are ... not good, and Git has a lot of underlying complexity that really cannot be ignored.)
I recommend that anyone new to Git avoid git pull
. Instead, run git fetch
, and then if you want merges, run git merge
. If you prefer a rebase-oriented work flow, run git fetch
, then run git rebase
. This will give you a better sense of what's going on underneath. It will also avoid a lot of mistakes that people almost always make as they first start using Git. The opposite of push is fetch, not pull.
You mention that:
Sometimes ... I see master|MERGING
in my bash prompt prefix
You can also run git status
to see whether you are in the middle of a conflicted merge.
In fact, what you have in your bash prompt is a set of clever interactions between your bash shell and your Git commands, where your bash checks with Git each time it's about to print a prompt. If Git reports that you are in a repository at all, the shell includes the repository's current-branch-name in the prompt. If you are in the middle of one of these conflicted merges, you get |MERGING
added as well.
But this leads back to the question: What exactly do we mean by "branch"? Click on that question to read more. The word branch can refer to either a branch name like master
, or a subgraph within the DAG (what I like to call a DAGlet; see that other question).
The git merge
command does not create a new name, so in this sense, it never creates a branch.
On the other hand, the git merge
command can tie together two subgraphs within the DAG—creating a new DAGlet. In this sense, it does create a new branch. Well, it does so sometimes, and it would be more accurate to say that it ties together some existing DAGlets. This is where that unavoidable complexity comes in.
Let's take a moment to examine commits
Probably the most important thing in Git is the commit. In Git, the true name of any commit is its hash ID—some big ugly string of hexadecimal digits, apparently random, basically useless to humans, but each string is unique to that particular commit and identifies it. That's how Git finds it: you tell it to look for some big ugly hash, a1f9c32...
or whatever, and your Git looks up the commit.
Each commit stores the ID of its previous or parent commit. What this means is that if you tell Git about the latest commit, Git can use that commit to look back to the second-latest commit. That second-latest commit has inside it the ID of the third-latest commit, and so on. If the commit IDs were just easy uppercase letters, we could draw them like this:
... <-F <-G <-H
where H
is the latest, and it points to (contains the ID of) G
, which points to F
, and so on, backwards through history.
These commits are the history; all Git needs to know is which one is the latest. That's where a branch name like master
comes in: instead of making you memorize the crazy hash IDs, Git stores the latest one under a name like master
.
This means that when you add a new commit to your master
, what you are doing is having Git save, in a new commit, the ID of the old tip of the chain, and then having your Git rewrite your master
to hold the ID of the new tip:
...--F--G--H--I <-- master
Now I
points back to H
, which (still) points back to G
, and so on.
There is more than one Git repository involved
You start with this:
Let's suppose I have 2 users working on the same branch (master
).
Regardless of whether this is good practice or not, there's a sense here that the name, master
, means only one thing. But that's just not true, because you have a Git repository, and the second user has a Git repository, and the place you're git push
ing to has a Git repository. Everyone gets a car upvote meme repository! And every one of you has your own master.
What you all share, however, are some set of commits. You all started out by cloning some repository, and that had some set of commits, and you got them all. You may have added some more since then. So, for each commit hash ID, your Git repository either has that commit, indicated by its ID, or doesn't. If your Git repository doesn't have that commit, that's where git fetch
and git push
come in.
What git fetch
and git push
do is to connect two Git repositories. At this point, whoever is doing the sending—your Git if you are the one doing git push
, or the other Git repository if you are doing git fetch
—packages up any commits they have that you don't, or vice versa. The sender delivers that pack of commits (and the files that go with it) to the receiver.
The receiver now has a bit of a problem, because commits that are only identified by hash IDs are pretty useless to humans. The receiver needs to give the last commit a name.
When you run git fetch origin
,1 you obtain new commits from their master
, so the name your Git uses to remember their master
is origin/master
.
1Here, origin
is the name of a remote. Most repositories have exactly one remote, named origin
: when you run git clone <url>
to clone a repository, the clone process sets up this remote, whose name defaults to origin
, to remember the URL.
Fetching, then merging
Let's suppose that you both started with a chain of commits ending at H
, and you've added I--J
and they—whoever they is here—added K--L
:
...--F--G--H--I--J <-- master
\
K--L <-- origin/master
It's now your job to combine your work—whatever you did in commits I-J
—with their work, whatever they did in K-L
.
The simplest method of combining in Git is git merge
. This particular kind of merge, which I like to call a true merge, works by quite literally combining your work and their work. To do this, it has to start from the point where you two branched apart. Note that this branching-apart has nothing to do with the name master
itself. It's because you made commits, and they made commits.
The merge operation has two parts. The first is what I like to call merge as a verb, or to merge: to combine work. Now, it's clear from looking at the drawing above that the last commit you both had in common was commit H
. This is what Git calls the merge base.
Git now runs git diff
twice.2 You can do it yourself:
git diff --find-renames <hash-of-H> <hash-of-J> > /tmp/what-we-did
git diff --find-renames <hash-of-H> <hash-of-L> > /tmp/what-they-did
You can now compare what you did to what they did. Git does this same thing in order to combine your changes with their changes.
If the things you changed are "far enough away from" the things they changed, or are in different files, Git will combine them successfully. More precisely, Git will think it combined them successfully. (It's up to you, the human who is smarter than Git, to decide on the real success here.) But if you changed the same source lines that they changed, Git won't be able to combine two different changes. Git will throw its metaphoric hands into the air, declare a merge conflict, and stop and make you clean up the mess.
This is when you see the merging status. Git has stopped in the middle of a conflicted merge. The commit graph still looks like the picture above, with two "branches" (in the DAGlet sense) forking off from a common commit; they have not yet come together. It's now your job to edit the mess into something sensible, run git add
on the result, and use git commit
(or in new enough Git versions, git merge --continue
—but this just runs git commit
) to finish the merge.
If Git thinks it can do the merge all on its own, though, git merge
will go ahead and run git commit
on its own too. Git won't stop in the middle with a conflict; it will just go on to the git commit
part.
This commit-that-concludes-a-merge, whether Git does it all by itself or stops and makes you clean up and do it, will tie together the two graph DAGlets:
...--F--G--H--I--J--M <-- master
\ /
K----L <-- origin/master
The new merge commit M
has two parents, instead of just the usual one. This is the second part of what git merge
does: it creates a merge commit, which uses the word merge as an adjective. The merge commit ties together these two DAGlets. Merging did not create the graph fragments. Merging simply tied them together.
Making this merge commit concludes the process of merging, so that you are now back to the normal, non-merging state. However, you now have a new commit that, obviously, no one else has: you have merge commit M
, which you just made, which therefore has a new and unique hash ID that no one else could possibly have yet.
(You can now use git push
to share any commits that you have that they don't.)
2Internally, Git uses a whole bunch of short-cuts to avoid a lot of work if possible, but in the end, the combine part does require computing the diff.
I'm also leaving out a lot of detail here about how the to merge process works. This matters mainly when you have to clean up the mess of a failed merge: the merge takes place in your index (also called your staging area) as well as in your work-tree (where you do your work). The separations between commits, the index, and the work-tree matter more as you start to do more advanced things in Git.
When a merge isn't a merge
Not all git merge
operations merge! Suppose you haven't done any work since you ran git clone
, so that you have, say:
...--F <-- master, origin/master
Now you run git fetch
and pick up new commits G
and H
:
...--F <-- master
\
G--H <-- origin/master
Note that G
points back to F
, which is where you are now. If you now run git merge origin/master
(or just git merge
) to bring yourself forward, Git notices that there is no actual divergence here. Instead of combining your lack-of-work with their work, Git can simply fast-forward the name master
so that it points to commit H
, and check out commit H
, giving you:
...--F--G--H <-- master, origin/master
When git merge
does this, it says "fast-forward": there's no diffing, no combining of work, and no new merge commit. This process is very easy for Git, compared to a true merge: in the end, it's essentially the same amount of work as git checkout
.
You can rebase instead of merging
I won't go into great detail here, but instead of merging your work with someone else's work, you can rebase your work on someone else's work. Suppose we start with the same work-that-needs-combining diagram:
...--F--G--H--I--J <-- master
\
K--L <-- origin/master
Instead of merging, we can have Git copy commits I
and J
to new, somewhat different commits which we can call I'
and J'
to remind us that they're a lot like I
and J
, but not the same. (They'll have new, unique hash IDs, different from your original I
and J
.) We simply arrange them like this:
...--F--G--H--I--J <-- master
\
K--L <-- origin/master
\
I'-J' <-- ???
Now that we have these copies made, we have our Git "peel the label" master
off J
and make it point to J'
instead:
...--F--G--H--I--J <-- ???
\
K--L <-- origin/master
\
I'-J' <-- master
If you now choose to give up the originals, since you don't need them any more, we can remove the question-mark labels entirely and straighten out one of the kinks in the graph:
...--F--G--H--K--L <-- origin/master
\
I'-J' <-- master
Now it seems as though you wrote your two commits after picking up commit L
, instead of writing them based on commit H
. The source code that is carried with the two new commits I'
and J'
is based on what's in L
, rather than what's in H
.
If you would have gotten merge conflicts when merging, you will almost certainly get merge conflicts when rebasing—and you may get many repeated conflicts (depending on how many commits you made), and they may be harder to resolve than they would be with merging. But in the end, you avoid the merge commit itself, and when you go to look at the history of all development, it will seem simpler, even if it was actually more complicated.
Whether to do this is up to you. If you do choose to do it, though, you do it instead of merging. Remember that rebase works by copying some set of commits; be careful to copy only those commits that you have not already published (with git push
), or to be sure that anyone else who had the originals switches away from the originals to the new improved copies.
One final note on git push
We've seen above that fetch (not pull) is the opposite of push. But there is still one more bit of asymmetry. When you run git push
, you have your Git hand over your commits to some other Git repository. This all works by hash IDs, just like fetch works by hash IDs—and at the end, just like fetch had to set a name in your repository to remember their latest master
, push
has to set a name in their repository to remember your last commit.
But—here's the asymmetry—your Git asks their Git to set their master
. It's not their bob/master
, it's their master
. As a general rule, their Git will refuse to allow this operation unless it's a fast-forward.
This fast-forward is the same kind of thing we saw with a fast-forward not-really-a-merge git merge
. It means that we are only adding commits to some branch name: the new commit(s) eventually point back to the commits they already had. That's what you are doing when you run git merge
or git rebase
: you combine your work with their latest, so that your work adds on to theirs, rather than erasing some of theirs.
In the end, Git is all about the commits, but the names—the branch names like master
, or your remote-tracking names like origin/master
—are all about finding the commits. The names are both for humans (who can't deal with raw hash IDs) and for locating the tip-most commit on a branch. That tip commit points backwards, in the same way that Git always works backwards, to earlier commits, and by doing so, it keeps those earlier commits alive. The name keeps the tip commit alive and the tip commit keeps earlier commits, and that's why git push
demands that the push be a fast-forward.
(You can do non-fast-forward pushes using git push --force
or equivalent. This has the effect of killing off some commit in the Git that receives the force-push. If you want to kill them off, that's OK—but other Git users may have fetch
ed those commits, so beware: they could come back!)