I think it might help to view some alternative presentations of a graph.
Consider this very simple diamond-shaped graph, with later commits drawn higher and earlier commits drawn lower:
D
/ \
B C
\ /
A
Here, D
could be the extremely-shortened hash ID of a merge commit, with B
and C
being its two inputs (two parents, in Git's terminology). A
is the (one, single) parent of both B
and C
. As a result, B
and C
are siblings, or would be if Git used that concept directly: both have the same parent, so they must be brother and sister (or two brothers or two sisters or whatever). But Git doesn't normally talk about commits this way—it's normally only interested in immediate parent/child relationships.
We could—and git log --graph
does—also draw this as:
* d...... fourth message
|\
| * c...... third message
* | b...... another message
|/
* a...... some commit message
Like human children, Git's "children" can have more than one parent. However, the most typical case is to have just one parent, in which case that parent is the first, last, and only parent. You can number it—C^1
is A
, for instance—but there's no real need. There is no C^2
and asking for it will just get you an error.
In StackOverflow postings, I like to draw my graphs with earlier commits at the left and later ones at the right, like this:
B
/ \
A D <-- master (HEAD)
\ /
C <-- develop
This gives me room to insert the branch names, and attach the word HEAD
to one of them, as Git normally does. However, this makes it difficult to tell which parent is the first, and which is the second. Note that the first vertical diagram above has the same problem.
Any commit with at least two parents is called a merge commit. This uses the word merge as an adjective, modifying commit. We'll also see it often as just a merge, using the word merge as a noun. As ElpieKay notes, a merge commit can have more than two parents, but these octopus merges (as Git calls them) don't do anything you can't do with just pairwise merges, so they are mostly just for showing off. :-) When you do have a merge with three or more parents, you can number all of them. The only special distinction that Git itself makes is for the first parent, though, using the --first-parent
flag to various Git commands such as git log
.
Confusingly, the git merge
command does not have to make a merge commit. It has two parts, which I like to refer to merge as a verb—the act of combining work—and then subsequently making a commit. The commit it makes is a merge commit unless you tell it not to. And, as if this weren't confusing enough already, several additional Git commands do the merge as a verb part without producing a merge commit. So it's important to keep in mind the distinction between the verb form, to merge, and the noun or adjective form.
It's worth noting a few more items:
All commits—all Git objects, really—are read-only. Once a commit is made, it can never be changed, because its hash ID—its true name, as it were—is computed by running all of its underlying data through a hash function. If you were to somehow change even a single bit inside the commit, you'd get a new and different hash, and hence a new and different commit.
Since the parent or parents exist when the child is made, the child can record the parents.
But since its child or children do not exist yet when the parent is made, the parent cannot record its children.
It's these backwards parent <- child linkages that form the commit graph.1
This means that Git's internal linkages are all, always, backwards. Git must start at the last, child-most, commit and work backwards. That's why branch names like master
always point to the tip commit of the branch. As Git works backwards, one commit at a time, merges—which have two or more parents—present a problem: Git can only move back from the child to one of the parents. Git's usual solution to this problem is to place all of the parents into a queue, then work on the first commit in the queue.
The --first-parent
flag tells Git to put only the first parent into the queue, ignoring the second parent (and any additional parents if this is an octopus merge). That allows Git to walk the commit graph without ever having more than one commit at a time to deal with.
1Mathematically, any graph G is defined by a collection of vertices V and edges E, which we write as G = (V, E). A graph can be directed, and Git's is: the links from vertex to vertex go only one way. Such edges are called arcs. In our case I prefer to call the vertices nodes; these are the actual commits themselves, and each node contains a list of all its outgoing arcs, i.e., the hash IDs of the parent commits.
Git's commit graph is not only directed but also acyclic, meaning that if we start at any one commit, and walk through the graph, we'll never return to that same commit. No parent can be its own child, in other words. This is a useful and important property for the various graph transformations that Git does, so we sometimes talk about the Git commit graph as a DAG, which is short for Directed Acyclic Graph. The commit graph, or commit DAG, represents all the snapshots you have ever made.
Note that each source snapshot is simply attached to one commit. The graph operations don't have to care what is in the corresponding snapshots: they look only at the graph itself!