There is not a history of commits. There are only commits; the commits are the history.
Each commit is uniquely identified by a hash ID. That hash ID is the true name of the commit, as it were. If you have that commit, you have that hash ID. If you have that hash ID, you have that commit. Read out the big ugly hash ID and see if it's in your database of "all the commits that I have in this repository": i.e., see if Git knows it. If so, you have that commit. For instance, b5101f929789889c2e536d915698f58d5c5c6b7a
is a valid hash ID: it's a commit in the Git repository for Git. If you have that hash ID in your Git repository, you have that commit.
People don't normally type in, or use, these hash IDs at all. Git uses them, but Git is a computer program, not a human. Humans don't do well with these things—I have to cut and paste the above hash ID or I'll get it wrong—so humans use a different way to get started. Humans use branch names. But many different Git repositories all have master
and this master
doesn't always (or ever!) mean that big ugly hash ID I typed in above. So a name like master
is specific to one particular Git repository, while hash IDs are not.
Now, every commit stores some stuff. What a commit stores includes a snapshot of all the files that go with that commit, so that you can get it back out later. It also includes the name and email address of the person who made that commit, so that you can tell who to praise or blame. It includes a log message: why the person who made the commit says they made that commit. But—here's the first tricky part—almost every commit also includes at least one hash ID, which is the commit that comes before this particular commit.
So, if you have b5101f929789889c2e536d915698f58d5c5c6b7a
, then what you have is this:
$ git cat-file -p b5101f929789889c2e536d915698f58d5c5c6b7a | sed 's/@/ /'
tree 3f109f9d1abd310a06dc7409176a4380f16aa5f2
parent a562a119833b7202d5c9b9069d1abb40c1f9b59a
author Junio C Hamano <gitster pobox.com> 1548795295 -0800
committer Junio C Hamano <gitster pobox.com> 1548795295 -0800
Fourth batch after 2.20
Signed-off-by: Junio C Hamano <gitster pobox.com>
(The tree
line represents the saved snapshot that goes with this commit. You can ignore this here.) The parent
line gives the hash ID of the commit that comes before b5101f929789889c2e536d915698f58d5c5c6b7a
.
If you have b5101f929789889c2e536d915698f58d5c5c6b7a
you almost certainly also have a562a119833b7202d5c9b9069d1abb40c1f9b59a
. The history for the later commit is the earlier commit.
If we replace each of these big ugly hash IDs with a single uppercase letter,1 we can draw this sort of history a lot more easily:
... <-F <-G <-H
where H
is the last commit in a long chain of commits. Since H
holds G
's hash ID, we don't need to write down G
's big ugly hash ID, we can just write down H
's hash. We use that to have Git find G
's ID, inside H
itself. If we want F
, we use H
to find G
to find F
's ID, which lets Git retrieve F
.
But we still have to write down that last hash ID. This is where branch names come in. Branch names like master
act as our way of saving the hash ID of the last commit.
To make a new commit, we have Git save the hash ID of H
in our new commit. We have Git save a snapshot and our name and email address and all the rest of that as well—"the rest" includes a time stamp, the precise second when we had Git do all this. Now Git computes the actual hash ID of all of this data, including the time stamp. The commit is now saved in our database of all commits, and Git has given us a new hash ID I
:
...--F--G--H <-- master
\
I
We have Git automatically write I
's hash ID into our name master
:
...--F--G--H--I <-- master
and we've added new history, which retains all the existing history.
1Of course, if we only used one uppercase letter like this, we'd run out of the ability to create commits, anywhere in the world, after creating just 26 commits. That's why Git's hash IDs are so big. They hold 160 bits so the number of possible commits or other objects is 2160 or 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976. As it turns out, this isn't really enough, and Git will probably move to a larger hash that can hold 79,228,162,514,264,337,593,543,950,336 times as many objects. While the first number is big enough to enumerate all the atoms in the universe, there are specific attacks that are troublesome, so a 256-bit hash is a good idea. See How does the newly found SHA-1 collision affect Git?
This tells you how to have the same history
History is the commits. To have the same history in two branches, you need both branch names to point to the same commit:
...--F--G--H--I <-- master, dev
Now the history in master
is: Starting at I
, show I
, then move back to H
and show H
, then move back to G
... Likewise, the history in dev
is: Starting at I
, show I
, then move back to H
...
Of course, that's not quite what you want. What you want is to have history that diverges, then converges again. That's what branches are really about:
...--F--G--H <-- master
\
I <-- dev
Here the history in dev
starts (ends?) at I
, then goes back to H
, and then G
, and so on. The history in master
starts (ends?) at H
, goes back to G
, and so on. As we add more commits, we add more history, and if we do it like this:
K--L <-- master
/
...--F--G--H
\
I--J <-- dev
then the history of the two branches diverges. Now master
starts at L
and works backwards, while dev
starts at J
and works backwards. There are two commits on dev
that are not on master
, and two commits that are on master
that are not on dev
, and then everything from H
on back is on both branches.
This divergence—the commits that are not on some branch—is where the lines of work diverge. The branch names still only remember one commit each, specifically the tip or last commit of each line of development. Git will start at this commit, by the saved hash ID, and use that commit's saved parent hash ID to walk backwards, one commit at a time. Where the lines rejoin, the history rejoins. That's all there is in a repository, except for the next section.
Merges combine history (and snapshots)
What you can do now is make a merge commit. The main way to make a merge commit is using the git merge
command. This has two parts:
- combining work, where Git figures out what has changed in each line of development; and
- making a merge commit, which is a commit with exactly one special feature.
To make a merge, you start by picking one branch tip. You run git checkout master
or git checkout dev
here. Whichever one you pick, that's the commit you have out now, and Git attaches the special name HEAD
to that branch name to remember which one you picked:
K--L <-- master (HEAD)
/
...--F--G--H
\
I--J <-- dev
Now you run git merge
and give it an identifier to choose the commit to merge. If you're on master
= L
, you'll want to use dev
= J
as the commit to merge:
git merge dev # or git merge --no-ff dev
Git will now walk the graph as usual to find the best shared commit—the best commit that's on both branches, to use as a starting point for this merge. Here, that's commit H
, where the two branches first diverge.
Now Git will compare the snapshot saved with commit H
—the merge base—to the one in your current commit L
. Whatever is different, you must have changed on master
. Git puts those changes into one list:
git diff --find-renames <hash-of-H> <hash-of-L> # what we changed
Git repeats this but with their commit J
:
git diff --find-renames <hash-of-H> <hash-of-J> # what they changed
Now Git combines the two sets of changes. Whatever we changed, we want to keep changed. Whatever they changed, we want to use those changes too. If they changed README.md
and we did not, we'll take their change. If we changed a file and they didn't, we'll take our change. If we both changed the same file, Git will try to combine those changes. If Git succeeds, we have a combined change for that file.
In any case, Git now takes all of the combined changes and applies them to the snapshot in H
. If there were no conflicts, Git automatically makes a new commit from the result. If there were conflicts, Git still applies the combined changes to H
, but leaves us with the messy result, and we have to fix it up and do the final commit ourselves; but let's assume there were no conflicts.
Git now makes a new commit with one special feature. Instead of just remembering our previous commit L
, Git has this merge commit remember two previous commits, L
and J
:
K--L <-- master (HEAD)
/ \
...--F--G--H M
\ /
I--J <-- dev
Then, as always, Git updates our current branch to remember the new commit's hash ID:
K--L
/ \
...--F--G--H M <-- master (HEAD)
\ /
I--J <-- dev
Note that if we do the merge by running git checkout dev; git merge master
, Git would do the same two diffs and get the same merge commit M
(well, as long as we did it at the exact same second so that the time stamps match up). But then Git would write the hash ID of M
into dev
rather than into master
.
In any case, if we now ask about the history of master
, Git will start at M
. Then it will walk back to both L
and J
and show both of them. (It has to pick one to show first, and git log
has a lot of flags to help you choose which one to show first.) Then it will walk back from whichever one it picked first, so that it now has to show both K
and J
, or both L
and I
. Then it will walk back from whichever one it picked to show.
In most cases Git shows all the children before any of the parents, i.e., eventually, it will have shown all four of I
, J
, K
, and L
and have only H
to show. So from here, Git will show H
, then G
, and so on—there's now just one chain to walk back, one commit at a time. But be aware that when you traverse back from a merge, you run into the which commit to show next problem.
git merge
does not always make a merge commit
Suppose you have this history:
...--F--G--H <-- master
\
I--J <-- dev
That is, there's no divergence, dev
is merely strictly ahead of master
. You do git checkout master
to select commit H
:
...--F--G--H <-- master (HEAD)
\
I--J <-- dev
and then git merge dev
to combine the work you've done since the merge base with the work they did since the merge base.
The merge base is the best shared commit. That is, we start at H
and keep going back as needed, and also start at dev
and keep going back as needed, until we reach a common starting point. So from J
we go back to I
and to H
, and from H
we just sit quietly at H
until J
goes back here.
The merge base, in other words, is the current commit. If Git ran:
git diff --find-renames <hash-of-H> <hash-of-H>
there would be no changes. The act of combining no changes (from H
to H
via master
) with some changes (from H
to J
via dev
), then applying those changes to H
, is just going to be whatever is in J
. Git says: well, that was too easy and instead of making a new commit, it just moves the name master
forwards, in the opposite of the usual backwards direction. (In fact, Git really did work backwards—from J
to I
to H
—in order to figure this out. It just remembers that it started from J
.) So what you get here, by default, is this:
...--F--G--H
\
I--J <-- dev, master (HEAD)
When Git is able to slide a label like master
forward like this, it calls that operation a fast-forward. When you do this with git merge
itself, Git calls it a fast-forward merge, but it's not really a merge at all. What Git really did was to check out commit J
, and make master
point to J
.
In many cases, this is is OK! The history is now: For master
, start at J
and walk back. For dev
, start at J
and walk back. If that's all you need and care about, that's fine. But if you want a real merge commit—so that you can tell master
and dev
apart later, for instance—you can tell Git: Even if you can do a fast-forward instead of a merge, do a real merge anyway. Git will go ahead and compare H
to H
, and then compare H
to J
, and combine the changes and make a new commit:
...--F--G--H------K <-- master (HEAD)
\ /
I--J <-- dev
Now you get a real merge commit K
, with two parents as required to be a merge commit. The first parent is H
as usual, and the second is J
, as is usual for a merge commit. The history of master
now includes the history of dev
, but remains different from the history of dev
, because the history of dev
doesn't include commit K
.
Note that if you now switch back to dev
and make more commits, the result looks like this:
...--F--G--H------K <-- master
\ /
I--J--L--M--N <-- dev (HEAD)
You can now git checkout master
and git merge dev
again. This time you won't need --no-ff
because there is a commit that's on master
that's not on dev
, namely K
, and of course there are commits on dev that are not on
master, namely
L-M-N. The *merge base* this time is shared commit
J(not
H—
His also shared, but
J` is better). So Git will combine changes by doing:
git diff --find-renames <hash-of-J> <hash-of-K> # what did we change?
git diff --find-renames <hash-of-J> <hash-of-N> # what did they change?
What did we change from J
to K
? (That's an exercise for you, the reader.)
Assuming Git is able to combine the changes on its own, this merge operation will succeed, producing:
...--F--G--H------K--------O <-- master (HEAD)
\ / /
I--J--L--M--N <-- dev
where new merge commit O
combines the J
-vs-K
changes with the J
-vs-N
changes. The history of master
will start at O
and will include N
and M
and L
and K
and J
and I
and H
and so on. The history of dev
will start at N
and include M
and L
and J
(not K
!) and I
and H
and so on. Git always works backwards, from child to parent. Merges let / make Git work backwards along both lines at the same time (but shown to you one at a time, in some order depending on arguments you supply to git log
).