Understanding the git graph

Question

I used the git log --graph --all command to visualise my commit/branch history. However whilst I understand what most of what the graph shows there are some parts of the branch visualisation I am having difficulty interpreting.

The branches that I am having difficulty interpreting are:

a) Commit 309a287

b) Commit 3f7475

c) The interpretation of ) in e9415f

I am also slightly puzzled by why there seems to be three branches (three vertical parallel lines) as I only ever had a master branch and one additional branch at the same time.

|
| * commit 88531a7dc85d030016296a51c5c433b72c7186c5 (refs/stash)
|/| Merge: c501f9f 631678b
| | Author: 
| | Date:   Wed May 20 17:01:59 2020 +0100
| |
| |     On blackforest: ddd
| |
| * commit 631678b558576418c21496712db524eb86a755ec
|/  Author: 
|   Date:   Wed May 20 17:01:59 2020 +0100
|
|       index on blackforest: c501f9f induced typo
|
* commit c501f9fc6b817541d8cb6a466c77b8d84231afcb (origin/blackforest)
| Author: 
| Date:   Sun May 17 18:53:59 2020 +0100
|
|     induced typo
|
| *   commit e9415f49f953ca4fe53bd9631e4afea94c3ba4ba (HEAD -> master, origin/master
)
| |\  Merge: 9a1ff1f 3f74752
| | | Author: 
| | | Date:   Sun May 17 18:51:40 2020 +0100
| | |
| | |     Merge branch 'master' of
| | |
| | *   commit 3f7475269c1134aef39a06f13ae11d74c9496542
| | |\  Merge: 309a287 3d41b51
| |_|/  Author:
|/| |   Date:   Sun May 17 18:46:19 2020 +0100
| | |
| | |       Merge pull request #1 from 
| | |
| | |       enhanced emoticon
| | |
* | | commit 3d41b51c8157abe38d251e065ad33b3d265d332c
| | | Author: 
| | | Date:   Sun May 17 18:42:00 2020 +0100
| | |
| | |     enhanced emoticon
| | |
| * | commit 9a1ff1fa645e8390ab26f751dbf1b716e15b0df6
| |/  Author: 
| |   Date:   Sun May 17 18:50:09 2020 +0100
| |
| |       capitalised a
| |
| *   commit 309a287f9c39d219d719efbcc872a176f7644b19
| |\  Merge: dafe938 806e855
| |/  Author: 
|/|   Date:   Sun May 17 17:42:46 2020 +0100
| |
| |       Merge branch 'blackforest'
| |
* | commit 806e8558cd7b24658a998b2ee5d19500e608b77d
| | Author: 
| | Date:   Sun May 17 17:06:10 2020 +0100
| |
| |     new common
| |
* | commit a5e386a55389a6435a050be9981ea97011449783
| | Author: 
| | Date:   Sun May 17 17:32:26 2020 +0100
| |
| |     Edited firstfile to reflect new branch name
| |
| * commit dafe938bcfbdfe6eaf62d5f446637e6f5d594015
|/  Author: 
|   Date:   Sun May 17 17:06:10 2020 +0100
|

I never use graph with full commit messages. Try `git log --graph --oneline --all`, maybe that makes the output clearer. `)` is simply a continuation from the line above (`(HEAD -> master, origin/master`…`)`) — knittl, May 21 '20 at 14:04
have you tried https://gitahead.github.io/gitahead.com/ ...other GUIs are available...https://stackoverflow.com/questions/61905276/how-to-get-a-graphic-representation-of-git-branches-on-windows-that-really-shows — mike, May 21 '20 at 14:23

torek · Accepted Answer · 2022-05-27T20:35:58.337

(I find your question a little unfocused, in that I'm not sure what particular aspects of git log --graph output here are confusing, so this is going to be long, I'm afraid: I'll try to cover everything important.)

knittl noted in a comment that the stray closing parenthesis is just a wraparound from a previous line. Normally git log runs its output through a pager, and smart pagers can take care of this problem by giving you a left-and-right scrollable "window" over your text, so that the close parenthesis simply disappears off the right side of a Terminal window (or whatever terminal emulator you use).

With that out of the way, let's look specifically at 309a287 and 3f7475, but start with this:

I am also slightly puzzled by why there seems to be three branches (three vertical parallel lines) as I only ever had a master branch and one additional branch at the same time.

You had, apparently, two names: master and blackforest. But you also had two different repositories:

Merge branch 'master' of [snipped]

The snipped part would be a URL; this particular message, Merge branch '<name>' of <url> (which may end with into <name>) is the one that git pull cooks up when it passes a merge commit message on to git merge.¹ So this git log --graph output implies that you ran git pull which invoked git merge on a commit it got from some other Git repository.

We can in fact see that commit: it is the second parent of merge commit e9415f4.... So if we take just these four lines:

| *   commit e9415f49f953ca4fe53bd9631e4afea94c3ba4ba (HEAD -> master, ori...

| |\  Merge: 9a1ff1f 3f74752
| | *   commit 3f7475269c1134aef39a06f13ae11d74c9496542
| | |\  Merge: 309a287 3d41b51

we can see it. It's one of your own mystery commits, in this case, 3f74752.... Its commit message begins with Merge pull request #1 from. GitHub and other similar hosting sites generate commits with such messages.

So, you—or someone else—must have made this commit on some web-hosting-site such as GitHub. You or they must have made this commit in a repository hosted there. That's a second Git repository and it has its own branch names, so depending on how many branch names it has, there could be thousands, or millions, of branches.

Their branches only affect your repository when you let them. You have your Git call up their Git and get any new commits from them. Your Git adds these commits to your repository database, which tries fairly hard to keep every commit it's ever seen. If you then git merge one of their commits, you gain direct access, from your own branch name—whatever branch you're on right now—to this commit.

¹Remember, git pull means run git fetch, then run a second Git command. The second Git command defaults to git merge. That git merge command needs a merge message sometimes—whenever it makes a true merge—and git pull supplies one, when the second command is git merge. (If you tell git pull to run git rebase instead, no merge message is required so git pull doesn't supply one.)

Git is not really about branches

The way to understand what's going on is to realize that Git, in the end, really cares about commits. It's not so concerned with branches. It is not concerned at all with files, most of the time—files are just some nuisance thing that commits have. Of course, we wouldn't use Git at all if it weren't for the files inside the commits, but that's just us that care about files: that's not Git.

Once we see that it's the commits that matter to Git, we can see how branch names enter the picture. It is then possible—though still a bit of a leap—to go from branch names to the fundamental question of what we actually mean by branch (see What exactly do we mean by "branch"?):

A branch name in Git is a pointer to one specific commit. That is, it identifies one commit hash ID by name. Other Git names, such as tag names, can do this same job, but branch names have one other special property: they move automatically, so that they always name the last commit in the branch.

In other words, the hash ID stored under a branch name is the last commit in the branch, by definition. Then we just need to realize one more key thing: Each commit stores the hash ID of some set of previous commits.

The commits themselves are what form the graph. Given some commit—as identified by some hash ID—Git can reach into that commit and extract, from that commit, the hash IDs of its immediate predecessor. Or, for a merge commit, instead of one immediate predecessor, we have two.² Either way, though, we end up with something like this:

... <-F <-G <-H

where H is the hash ID of the last commit in some chain. Or maybe a chain ends at a merge commit:

...--I--J
         \
          M
         /
...--K--L

or has some additional commits afterwards:

...--I--J
         \
          M--N
         /
...--K--L

but in all cases, a chain ends at a branch tip commit, and it does so because a branch name points to that commit:

...--G--H   <-- master
         \
          I--J   <-- feature

All the commits up through the tip are on the branch, so in this case, commits up through H are on both master and feature, and I--J are on feature alone.

²A merge commit can actually have more than two parents, but we don't really need to worry about this here.

`git log` has to linearize things

Suppose we have a graph like this:

...--I--J
         \
          M--N   <-- master
         /
...--K--L

where N is the newest commit and its parent M is a merge that has two parents J and L. By drawing this graph sideways, with the newest commits towards the right, we can show that any work done on the upper or lower rows of commits (I-J and K-L respectively) isn't strictly ordered with respect to the other row, but only within its own row.

Note that Git finds commits by starting at the tip commit, as found by the branch name—here, that's master which leads to commit N—and then working backwards. As it works backwards, git log needs to move from each commit to its parent or parents. From N to M this is easy, as there is only one parent. From M back, though, git log really should visit both commits J and L simultaneously ... but it can't.

Specifically, git log has to print things out vertically. The tip commit N will come out first, and hence be at the top of our Terminal window. Below that will be commit M. If git log could do what I am about to draw, it might then show the left and right "sides" of the merge like this:

     N    <information>
     |
     M    <information>
    / \
   J   L   <information about J> <information about L>
   |   |
   I   K   <information about I> <information about K>
   :   :

But git log can't do that, so it approximates. It picks whichever of commits J and L has a later committer date and puts that out first:

     N    <information>
     |
     M    <information>
    / \
   |   L    <information about L>
   |   |
   J   |    <information about J>
   :   :

This is pretty close to the actual output, but it's slightly different:

N  <information about N>
|
M  <information about M>
|\
| L  <information about L>
| |
J |  <information about J>
: :

and that's what you mostly see in your own git log output.

"First-parent" is significant

The last thing that might seem especially peculiar is this:

| *   commit 309a287f9c39d219d719efbcc872a176f7644b19
| |\  Merge: dafe938 806e855
| |/  Author: 
|/|   Date:   Sun May 17 17:42:46 2020 +0100
| |
| |       Merge branch 'blackforest'
| |
* | commit 806e8558cd7b24658a998b2ee5d19500e608b77d

Why did git log do this funky jut out to the right, then swing back to the left thing? That is, why draw this:

| *
| |\
| |/
|/|
: :

when:

| *
|/|
: :

would likely do? [Edit, May 2022: Modern Git will do this now. I'm not sure which version of Git got smarter here.]

The answer here is that in Git, the first-parent-ness of a merge commit is significant. In my own horizontal drawings:

...--I--J
         \
          M--N   <-- master
         /
...--K--L

I try to make it look like there's nothing more important about the J <-M connection than there is about the L <-M connection. That's because in one sense, there isn't anything more important here: both J and L are parents of M, and when we made the merge, if we didn't use -X ours or -X theirs, there wasn't anything more important about either commit while resolving conflicts, either.

But the first parent of any merge commit is special, for the same reason that the first and only parent of any normal, non-merge commit is special: it's the direct lineage, backwards, of the commits we made, one commit at a time.

Consider how we make normal everyday non-merge commits. We start with:

git checkout master

which gets us something like this:

...--G--H   <-- master (HEAD)

That is, the special name HEAD is now attached to the branch name master. The current branch is now master. The current commit is the tip commit of master, i.e., commit H: it is commit H's content that we have in our work-tree, that we can work on.

If we do in fact do some work, and git add and git commit, we get a new commit, which we'll call I. The new commit has commit H as its parent:

...--G--H   [master used to point to H]
         \
          I

and the final act of git commit is to write I's actual hash ID, whatever that is, into the name master:

...--G--H   [master used to point to H]
         \
          I   <-- master (HEAD)

after which we can just draw these as a straight line again:

...--G--H--I   <-- master (HEAD)

As we repeat this process, the branch grows. Here's that ambiguous word, branch, again: this time it means a series of commits ending at a particular designated commit and the name master, both at the same time, depending on what we want it to mean at that moment.

...--G--H--I--J   <-- master (HEAD)

If we now have some other branch:

...--G--H--I--J   <-- master (HEAD)
         \
          K----L   <-- feature

and run git merge feature, we get a new merge commit M. The merge commit extends master because HEAD is attached to master:

...--G--H--I--J--M   <-- master (HEAD)
         \      /
          K----L   <-- feature

We could do this even if the name feature did not exist, as long as we could find commit L somehow. That is, if we made commits K and L like this, then deleted the name feature entirely while making sure Git doesn't clean out the commits, we would have:

...--G--H--I--J   <-- master (HEAD)
         \
          K----L   [unnamed]

We could then run git merge hash-of-L and we would get the same result as before:

...--G--H--I--J--M   <-- master (HEAD)
         \      /
          K----L

Commit L has now become find-able by name again: it is the second parent of M.

More commonly, we might merge feature into master, producing M, then delete the name feature, to leave us in this same state.

Putting this all together

Git does not care (much) about the names. Git only cares about the commits. Git uses the names to find tip commits, and then works backwards; and if that can find any given commit, the commit stays.

But in all cases, the first-parent-ness of a commit matters. For ordinary (non-merge) commits, the first parent is the only parent. For merge commits, the first parent is the commit that was the tip, at the time we made the merge.

If we use git log --first-parent—with or without --graph—git log will, whenever it reaches a merge commit, ignore all but the first parent. That is, given the graph fragment ending at merge commit M, git log without --first-parent will show:

commit M; then
commit J or L, in some order; then
commit I or J or K or L, in some order, but skipping whichever commit it showed before;

and so on, until it has shown all commits that can be found by starting at M and working backwards. It shows each commit one at a time, and the order it uses is something you can control, with various sort the commits in git log output options:

git log --author-date-order

uses the author-date timestamp instead of the committer-date timestamp (each commit has two date-and-time stamps in it). Or:

git log --topo-order

uses an ordering that git log --graph requires, so git log --graph turns that ordering constraint on.

Adding --first-parent tells git log that when it steps back from commit M, it should look only at commit J, not commit L. Commit L is the second parent, so git log should just throw it out of the list of commits to visit. The result will be that git log --first-parent master will show M, then J, then I, then H, then G, and so on.

The reason for this particular well-controlled order is that git log walks the graph one commit at a time using a priority queue. You give git log some starting commit(s):

git log master

or:

git log --all

for instance, and git log figures out the hash ID(s) of these commits. If you gave one branch name like master, that names one commit: the tip commit of that branch. The queue now has one entry in it.

Then, as long as the queue is not empty, git log executes a loop:

Take the front (highest priority) entry off the queue.
Show that commit. (There are options to maybe not show it here, but we did not use any of these options.)
If we haven't shown them yet and they are not already in the queue, place this commit's parent(s) into the queue. With the --first-parent option, place only the first parent into the queue.
Repeat.

So our git log master with --first-parent never has more than one commit in the queue at any time: it starts with one, removes it to get to zero, shows the commit, and puts in the one parent. Without --first-parent, it starts with one commit—M—in the queue, removes it (queue empty), shows M, and inserts two commits into the queue: J and L. The priority now matters.

The default priority is that later-committer-date commits have higher priority. If we made commit L later than we made commit J, the next commit we'll show is L. That's true whether or not L is the first or second parent.

The git log --graph code will make sure that the line connecting from M to L—its second parent—starts by going down-and-right [edit: or down-and-left now]. That's what we see here:

| *   commit 309a287f9c39d219d719efbcc872a176f7644b19
| |\  Merge: dafe938 806e855
| |/  Author: 
|/|   Date:   Sun May 17 17:42:46 2020 +0100
| |
| |       Merge branch 'blackforest'
| |
* | commit 806e8558cd7b24658a998b2ee5d19500e608b77d

The connector from 309a287... to 806e855... starts by going down and right, then doubles back over to join the left-side line. The second parent of 309a287... is 806e855.... (The first parent is dafe938....)

The line reaching down to 806e855... comes from commit 3d41b51..., which is an ordinary one-parent commit. That commit was found from commit c501f9fc..., which is also an ordinary one-parent commit, and is—or at least was, the last time your Git checked—the tip commit of branch blackforest in the Git repository over at origin (on GitHub or wherever).

(Your graph is cluttered slightly with the two commits made by git stash, as well. These commits are not on any branch, but are reachable via the name refs/stash. Note that one of them is technically a merge commit—but git merge did not make this commit; git stash made it; and if you treat it as if it were an ordinary merge, most Git commands won't make good sense from it, as they'll assume git merge made it. Only git stash itself knows how to take this one apart later.)

This is a fabulous write up, thank you! – jgreve May 27 '22 at 16:21 — jgreve, May 27 '22 at 16:21

Understanding the git graph

1 Answers1

Git is not really about branches

git log has to linearize things

"First-parent" is significant

Putting this all together

`git log` has to linearize things