How to get more useful branch diagrams in git?

Question

I sometimes struggle trying to see what was going on in the history of git repositories, even with great tools like SourceTree the branch diagrams can be confusing. The main problem for me is that I can't tell which branches some commits were made on, and the string of commits on a single branch are often displayed on different visual lines as the number of concurrent branches and people working on branches increases and decreases.

My initial thoughts were along the lines of "What if git stored the branch name that a commit was made on? Then diagram generators could group those commits on the same line". This question: Why does Git not store the branch name as part of the commit? asks the same thing (but for different reasons) and after reading it and some other links I realised that simply storing the branch name wouldn't solve my issue anyway. For example when multiple people are making alternating commits on their local branches with the same branch-name, trying to display these on the same line would be technically wrong.

Anyway, so on to my questions...

Is there currently a way to infer the correct historic branching structure and produce a nice looking diagram of it - perhaps looking for "merge master into branchX" style commit messages?
Is there a valid case for a feature within git that helps to preserve some of the workflow information (i.e. context under which that work was done), or is that already possible and I am just doing it all wrong?

Regarding question 2: I was thinking perhaps something like a named "work-stream". When creating a branch you could optionally provide a work-stream name (or 'inherit' from the work-stream name of the current branch), and when switching branches the current work-stream would also change. Each commit would therefore be made in the context of a work-stream and this information could either be stored in the commit or as separate meta-data in the git repo. I don't really know about the inner workings of git so there may be other/better ways to achieve this. The branch diagrams could then do something visually obvious (such as a different background colour) to help see how the commit-chains flowed between different work-streams.

Gary Fixler · Accepted Answer · 2013-08-14T22:51:22.353

Keep in mind that git is the self-proclaimed "stupid content tracker." It doesn't want to be any smarter than it needs to be, because this keeps it highly flexible, and therefore very powerful. Commits themselves are just text files, and nothing more than that. They literally look like this:

tree 68cd5b298858425fd8b5c2c0adfa62249b4eb650
parent c9ac0bb0d74d59b1ccb5aa8c498c33314b16948e
author aperson <aperson@somecompany.com> 1368010332 -0700
committer aperson <aperson@somecompany.com> 1368010332 -0700

Add test module for widget.py

tree points to the hashed text file that lists the contents of the project folder, and this listing can contain other trees, recursively, and blobs, which are hashed contents of files. parent is just the hash of another commit - which again, is just another text file like this one. The author and committer lines also have Unix timestamps with GMT offsets. The rest of the file is the commit message. A commit - the text file - answers the 5 meta-questions about itself: who (author/committer), what (pointer to tree), when (timestamps), where (parent commit(s)), and why (commit message). tree and parent (parent can be more than one line of parents to represent merge commits) are pointers. They say "that tree describes this commit's snapshot," and "that commit/those commits is/are where it came from."

You could in theory keep the branch name, but branches are also just pointers. If you're on the master branch, there's a text file called master in .git/refs/heads that contains the hash (i.e. the 40-digit number) that names the commit you're currently on. The fact that you're on that branch (also called a "head") is just a line in the .git/HEAD text file - also a pointer - that reads ref: refs/heads/master, which just points at the master filename in the heads folder under refs. It's all just pointers, but more importantly, it's all very fluid.

Git wants to be too dumb for branches to matter. What branch a commit was initially created on isn't really useful - or perhaps better put, is too unstable a thing to attempt to make it useful - because you can move commits around at your leisure. I've done it often to get things how I want them.

I realized I was making general and very project-specific commits back and forth on a project branch - let's call it mixedbranch - and that the project stuff deserved to be its own branch, so I interactively rebased to unzip them. I.e. I did git branch projectname to add a second branch head pointing to the same commit that mixedbranch was pointing to (but didn't switch to it), then did git rebase -i <commit before I started mixing things>. Then in the text editor that popped up, I deleted all of the project-specific commit lines, saved, and quit. This rebuilt the branch under the current name, but without the project-based commits (it's good to keep commits granular enough to allow this easily, for several reasons beyond just this).

Then I did git checkout projectname to switch to that new branch head, which was still holding position at the tip of all those mixed commits, and I did the same thing in reverse - git rebase -i <that same old commit before the mixing occurred>, but deleted the non-project specific lines this time. This rebuilt the new branch using the original branch commits, sans the non-project commits, and pointed projectname to the newly rebased head. Now that neither branch pointed to the original, interleaved set of commits, they all disappeared from view, and it looked like a-a-b-b-a-b-b-a-b-a<--mixed had simply become a-a-a-a<--nonmixed and b-b-b-b-b<--project. Storing which branch things had originally been created on would now be a mess. It's too "smart" a thing to be trusted, so git - "the stupid content tracker" - doesn't even try.

In terms of following the trail of commits on branches, all of this is done backwards, because commits - those text files - simply store references to their parent(s). You can only figure out lineage into the past. When you merge, you create a new commit, which is just a snapshot of the entire, merged tree. The branch you're currently on (or otherwise merging into) is considered the first parent, and the branch merging in is the second. If there are other branches being merged in as well, they'd be the third, fourth, etc. In this way, git can easily follow the primary commits back to keep track of - and visually indicate - which commits were, say, the master branch. When you switch to a branch and make a commit, then do a git log --oneline --graph --decorate --all, you see something like this (from the Vim easymotion plugin):

* 4fb1af8 (origin/develop, develop) Update jumplist when moving cursor
*   fe9f404 Merge branch 'master' into develop
|\  
| *   667a668 (HEAD, tag: 1.3, origin/master, origin/HEAD, master) Merge pull request #34 from Layzie/master
| |\  
| | * 06826d7 fix EasyMotion#InitHL arguments
| |/  
| *   afd0e42 Merge branch 'release/1.3'
| |\  
* | \   44c6bfd Merge branch 'release/1.3' into develop
|\ \ \  
| | |/  
| |/|   
| * | c4863f8 Update vim docs
| * | 8dc93e6 Update README
|/ /  
* | c1f1926 Update default leader key to <Leader><Leader>
* | 6f0c9b9 Move most of the code to autoload/

Even without the benefit of color, it's easy to see what's going on. The develop branch was the most recently committed-to, so it's shown at the top. If you follow the line down the left side of the graph, those are the primary commits of the develop and origin/develop branches. All the 'merged-in' branches come in from the right. The develop branch has a merge commit where master was merged in. You can follow the right side of that merge to the master branch (667a668), which is also a merge commit, which pulled in Layzie/master (06826d7), which was clearly developed on top of afd0e42 (which was clearly on the master branch (follow the line straight up to see the first-parent commit of master includes it) - all the context necessary is in the graph). That merge commit (667a668) tells you what branch that was - it was Lazie/master, so 06826d7 was the master branch from the Layzie remote repo.

This is where I would suggest a workflow habit - leave commit messages about what was merged in alone when merging. That's how you can - and should, IMO - know what commits were on which branches (current heads being the mechanism for telling which commits currently are on which branches). The merge says "on my left is the destination branch of the branch I'm merging in. On the right is the branch I'm merging in. In my commit message is the name that the branch I'm merging in was called at the time I merged it in" (important point implied here: you can replace branch names at will while working, and I often have).

By not making branch names part of commits you can cherry-pick and rebase things around without anything trying to reason about the ever-changing placement of commits, which is just too hard, and not helpful. A commit is just a full-tree snapshot, with some metadata about who created it, when, and where it fits into the dag of interconnected commits. This lets you pull commits from other branches, and even other people's repos. I've used this power to merge completely disparate repos together, interleaving the commits together (the opposite of what I mentioned in my first example). I've used this power to pull in the history of a particular file from another repo onto a new branch, then merge it in, so I could track its development in a more sensible place. Then I threw out the original, junk-filled repo that I didn't want anymore. You can see some examples here of how liquid git's commits are, and how much you can do with them because of it.

If I cherry-pick in a commit from a coworker's 'dev' branch, I don't care that it was on coworker/dev. I just want that work in my current branch. The commit metadata will show that I committed it where it now is, and that my coworker authored it, and when we each did those things. Now it's just changes on my branch that aren't trying to figure out that they're on my branch, but used to be on their branch. It would be a mess trying to track that in such a flexible system, and you'd probably end up with incorrect info with a system trying to be that smart (maybe not, but it seems a hard problem). I say let the merges tell you want branches were called, and let git sort first-parent histories to the left for you. I can say that I've had no problem following history for even dozens of branches at a time (and we've had terrible repos at work with so many parallel branches they don't fit across a widescreen monitor).

There are so many reasons to +1 this answer. Multiple, understandable, concrete examples are just one of them. — Wayne Conrad, May 09 '13 at 19:31
Maybe the tools I've used draw the graphs wrong but it rarely looks so neat, especially when multiple people are all committing on the same feature branch locally and keep merging it in from origin to their local copy and pushing. A single work-stream in this case looks like many lines. It sounds like the information I'm wanting should be stored at a repo level not commit level. Perhaps a simple text file to indicate the context in which each commit was brought into the repo (whether it was from another repo, or cherry-picked, or just ongoing work). Pure metadata - not affecting the commits? — jhabbott, May 09 '13 at 21:19
That's what I would do, instead of trying to fight git's methods. You can write a [post-commit hook](http://git-scm.com/book/en/Customizing-Git-Git-Hooks) ("After the entire commit process is completed, the post-commit hook runs. It doesn’t take any parameters, but you can easily get the last commit by running git log -1 HEAD. Generally, this script is used for notification or something similar.") to write the branch and commit sha to a text or JSON file, or whatever else you need. — Gary Fixler, May 09 '13 at 22:49

How to get more useful branch diagrams in git?

1 Answers1