TL;DR
The branchy structure you are seeing typically comes about without using a lot of separate names. To avoid it, you would rebase rather than merging. (It's not a question of names, really, although there's a tie-in between names—Git's branch names and its remote-tracking names, and for that matter tag names, which are all specific forms of what Git calls a reference—and what you want here.)
Git is all about commits
The first and most important thing to remember is that Git is really all about commits. The commit is Git's raison d'être. A commit does several things at once, by storing both data—specifically, a complete snapshot of a source tree—and metadata. Let's look at an actual commit:
$ git cat-file -p HEAD | sed 's/@/ /'
tree 982dc557269a91826c64dd7e3c7d63c4ccfefc90
parent 8c8ddbd0821d552ff3c7e1b67c669dd7f11d63d7
author Junio C Hamano <gitster pobox.com> 1515188717 -0800
committer Junio C Hamano <gitster pobox.com> 1515188717 -0800
Git 2.16-rc1
Signed-off-by: Junio C Hamano <gitster pobox.com>
The tree
line here gives the hash ID of the snapshot (Git stores everything by hash ID, which we'll come back to in just a moment). The parent
line gives the hash ID of the parent commit. The author and committer lines tell you who wrote the commit (author) and who put it into the repository (committer), along with a time stamp (Unix-format seconds-since-epoch plus a time zone offset). The rest of the metadata is the commit message.
Every Git object—there are four types: commit, tree, blob (basically the internal form of a file), and annotated tag—has a unique hash ID. This hash ID is the "true name" of the object, and is how Git stores it in the repository database, and looks it up. Hence, to find a commit, Git needs one of these hash IDs.
The problem with these hash IDs is that they're quite useless to humans. So Git adds names, like branch names, tag names, and remote-tracking names. All of them have one primary job: the name remembers a hash ID. That way, instead of trying to remember 36438dc19dd2a305dddebd44bf7a65f1a220075b
, I can just remember master
(a branch name) or v2.16.0-rc1
(an annotated tag name—I'm going to skip some details here).
There are some important facts about commits, or in fact any Git object:
- The hash ID is always unique. (See How does the newly found sha1 collision affect git?; follow some of the links for more discussion.)
- The hash ID is determined strictly from the contents of the object. If you change anything about the the object and store it back into the repository database, you get a new, different, unique hash ID.
What this means is that you cannot change anything about any commit. But you can copy objects to new (and different) objects. We also know that each commit lists its parent commits, as we see in the commit listed above. We'll make use of this idea soon. We also need one more item: a merge commit is a commit with two parents (or more, but we won't bother looking at this case).
Commits with their parent IDs form a Directed Acyclic Graph or DAG
In mathematical terms, a graph is a set of vertices V and edges E that connect those vertices. (See the diagrams on the Wikipedia page for examples.) In a directed graph, the edges have a direction: like one-way streets in a city, you can only travel one direction along the edge, from vertex (or in our case, Git commit) to vertex. These edges are called arcs to remind us that they're one way. (I'm not going to cover the acyclic part of DAG here even though it's important in a theory sense. It occurs naturally in Git, without any action on our part, and we don't have to care much.)
In our case, the direction is always backwards: from a later child commit, we can follow the arc leading to its parent. If we draw a chain of commits, we get something like this:
... <-o <-o <-o ...
where "newer" or "later" commits are towards the right. Each child commit points back to its parent, by virtue of storing the parent hash. Since the arrows always point backwards (left-ish), we can stop drawing them, which is good since there are not that many good arrow fonts to use on StackOverflow. :-)
Note that if we start a new chain with a child pointing back to an earlier parent, we get a divergence:
...--o--o--o--...
\
o--o--...
If a commit is a merge commit it has two arcs, leading back to both parents. It's these merge commits that rejoin these chains:
...--o--o--o--o---M
\ /
o--o--o
This last commit, marked M
, is a merge commit with two parents.
Now, this—making new commits in general, and making merge commits in particular—is where branch and other names come in ... or, sometimes, don't!
Finding commits at all: we need names, aka references
We mentioned above that hash IDs (36438dc...
) are useless to humans; we like names. So we have a name like master
for finding this ID. Git needs them too, at least some of the time. In particular, while Git can rummage through the entire database and find every object—and it has maintenance commands like git fsck
and git gc
that literally do that—this is very slow in a big repository. The fast operation is to take a known hash ID and find the data that go with that ID.
So, in general, we have Git start with names like master
, which find a commit like 36438dc...
. Git can show us that commit, or check it out, or whatever, using its information, especially its tree
line and sometimes its parent
. Or we can have Git step back one commit in history, to 36438dc...
's parent, which is of course another commit. We can have Git extract that commit, or look at its parent, and so on.
Whatever we do, though, it's the names that get this process started. The names identify one specific commit, from which we (or Git) can work backwards.
How branch names behave when we make new commits
When humans go to add commits to a Git graph, we do it by doing:
$ git clone <url> # at least for the first time
$ cd <repostiory> # as necessary
... do some work ...
$ git add file1 file2 ... # or git add -u, etc
$ git commit
The git add
step copies files from the work-tree, where we did our work on them, into the index / staging-area, replacing the previous version that was in the index / staging-area. The git commit
step makes the new commit from whatever is now in the index: all the old files the way they were before, and the changed files the way they are now that we copied them into the index with git add
.
When git commit
makes a new commit, it goes through the following steps (not necessarily in this order):
- Turn the index (all the files that go into the snapshot) into a tree object.
- Find the current commit's hash ID. That will be the new commit's parent.
- Collect up our name and email address and time-stamp for the
author
and committer
lines.
- Collect up a commit message.
- Write all of these out as a new commit object. Git assigns the new object its new, unique hash ID.
- Write the new hash ID into the current branch.
Let's look closely at steps 2 and 6. If we did a git checkout master
, so that we are (as git status
puts it) on branch master
, the current commit's hash ID is the one stored under the name master
. That's where we get the hash ID for step 2. In step 6, we replace the hash ID with the new commit we just created.
In other words, when we make a branch "grow" by using git checkout
and eventually git commit
, we're telling Git to make new, permanent, read-only commit objects whose parent is what the branch tip was before the commit, and to update the name to point to the new commit:
...--o--o--* <-- master
becomes:
...--o--o--*--@ <-- master
The name master
, in effect, moves to point to the latest commit.
But Git is distributed
While we're making our changes and committing them, other people have done git clone
of the same origin
repository, and have been adding their own commits. Your own repository (Tom's repo) might have this now:
@ <-- master
/
...--o--o--* <-- origin/master
Here, we're using the name origin/master
—a remote-tracking name rather than a branch name—to remember where master
was in the Git repository on origin
. The branch name master
is "Tom's master", not "origin's master". They were originally both pointing to the same commit, the one I marked *
, but since then, you made a new commit, with a new, unique ID.
Meanwhile, though, Sharon also did a git clone
and has been working:
...--o--o--* <-- master, origin/master
and now she makes a new commit in her repository, which gets a new, unique ID, different from your new and unique ID:
...--o--o--* <-- origin/master
\
● <-- master
If we were to somehow combine Sharon's repository and your repository, let's see what we'd get. Remember, each commit is uniquely identified by its hash ID, so the three middle-row commits are the same in your repository and in Sharon's:
@ <-- (Tom's master)
/
...--o--o--* <-- origin/master
\
● <-- (Sharon's master)
This forking / branching behavior has already occurred, even if we haven't combined your repository and Sharon's yet. It's occurred in a sort of virtual sense: it will be there once we do the combining.
So, let's say Sharon now runs git push origin master
. Her Git will call up the third Git at origin
and send it her commit ●
. Her Git will then ask origin
's Git to set origin
's master
to point to ●
. If all goes well, which it probably does, origin
's Git now has:
...--o--o--*--● <-- master
When your Git calls up origin
and downloads new commits, your Git gets commit ●
, which is new to your Git. Your Git remembers where origin
's master is by updating your own origin/master
, giving you:
@ <-- master
/
...--o--o--*--● <-- origin/master
This is the same diagram we drew before—the only difference is that we drew commit ●
on the same row, instead of a lower-down row.
It's now your job, since this has landed in your repository (not Sharon's), to do something about this. If you just naively run git merge
you will merge your commit @
with Sharon's ●
:
@--M <-- master
/ /
...--o--o--*--● <-- origin/master
This merge gets added just like any ordinary commit, except that instead of one parent ...
line, it has two: one for your @
commit and one for Sharon's ●
.
If you instead use git rebase
, you will copy your @
commit to a new-and-improved commit. The difference between your original @
and the new one is that your new one will build up from Sharon's, so its parent will be ●
:
@
/
...--o--o--*--● <-- origin/master
\
○ <-- master
Over time, using git rebase
instead of git merge
will give you a linear structure rather than a branchy one.