Keep in mind that git is the self-proclaimed "stupid content tracker." It doesn't want to be any smarter than it needs to be, because this keeps it highly flexible, and therefore very powerful. Commits themselves are just text files, and nothing more than that. They literally look like this:
tree 68cd5b298858425fd8b5c2c0adfa62249b4eb650
parent c9ac0bb0d74d59b1ccb5aa8c498c33314b16948e
author aperson <aperson@somecompany.com> 1368010332 -0700
committer aperson <aperson@somecompany.com> 1368010332 -0700
Add test module for widget.py
tree
points to the hashed text file that lists the contents of the project folder, and this listing can contain other trees, recursively, and blob
s, which are hashed contents of files. parent
is just the hash of another commit - which again, is just another text file like this one. The author and committer lines also have Unix timestamps with GMT offsets. The rest of the file is the commit message. A commit - the text file - answers the 5 meta-questions about itself: who (author/committer), what (pointer to tree), when (timestamps), where (parent commit(s)), and why (commit message). tree
and parent
(parent can be more than one line of parents to represent merge commits) are pointers. They say "that tree describes this commit's snapshot," and "that commit/those commits is/are where it came from."
You could in theory keep the branch name, but branches are also just pointers. If you're on the master branch, there's a text file called master
in .git/refs/heads
that contains the hash (i.e. the 40-digit number) that names the commit you're currently on. The fact that you're on that branch (also called a "head") is just a line in the .git/HEAD
text file - also a pointer - that reads ref: refs/heads/master
, which just points at the master
filename in the heads
folder under refs
. It's all just pointers, but more importantly, it's all very fluid.
Git wants to be too dumb for branches to matter. What branch a commit was initially created on isn't really useful - or perhaps better put, is too unstable a thing to attempt to make it useful - because you can move commits around at your leisure. I've done it often to get things how I want them.
I realized I was making general and very project-specific commits back and forth on a project branch - let's call it mixedbranch
- and that the project stuff deserved to be its own branch, so I interactively rebased to unzip them. I.e. I did git branch projectname
to add a second branch head pointing to the same commit that mixedbranch
was pointing to (but didn't switch to it), then did git rebase -i <commit before I started mixing things>
. Then in the text editor that popped up, I deleted all of the project-specific commit lines, saved, and quit. This rebuilt the branch under the current name, but without the project-based commits (it's good to keep commits granular enough to allow this easily, for several reasons beyond just this).
Then I did git checkout projectname
to switch to that new branch head, which was still holding position at the tip of all those mixed commits, and I did the same thing in reverse - git rebase -i <that same old commit before the mixing occurred>
, but deleted the non-project specific lines this time. This rebuilt the new branch using the original branch commits, sans the non-project commits, and pointed projectname
to the newly rebased head. Now that neither branch pointed to the original, interleaved set of commits, they all disappeared from view, and it looked like a-a-b-b-a-b-b-a-b-a<--mixed
had simply become a-a-a-a<--nonmixed
and b-b-b-b-b<--project
. Storing which branch things had originally been created on would now be a mess. It's too "smart" a thing to be trusted, so git - "the stupid content tracker" - doesn't even try.
In terms of following the trail of commits on branches, all of this is done backwards, because commits - those text files - simply store references to their parent(s). You can only figure out lineage into the past. When you merge, you create a new commit, which is just a snapshot of the entire, merged tree. The branch you're currently on (or otherwise merging into) is considered the first parent, and the branch merging in is the second. If there are other branches being merged in as well, they'd be the third, fourth, etc. In this way, git can easily follow the primary commits back to keep track of - and visually indicate - which commits were, say, the master branch. When you switch to a branch and make a commit, then do a git log --oneline --graph --decorate --all
, you see something like this (from the Vim easymotion plugin):
* 4fb1af8 (origin/develop, develop) Update jumplist when moving cursor
* fe9f404 Merge branch 'master' into develop
|\
| * 667a668 (HEAD, tag: 1.3, origin/master, origin/HEAD, master) Merge pull request #34 from Layzie/master
| |\
| | * 06826d7 fix EasyMotion#InitHL arguments
| |/
| * afd0e42 Merge branch 'release/1.3'
| |\
* | \ 44c6bfd Merge branch 'release/1.3' into develop
|\ \ \
| | |/
| |/|
| * | c4863f8 Update vim docs
| * | 8dc93e6 Update README
|/ /
* | c1f1926 Update default leader key to <Leader><Leader>
* | 6f0c9b9 Move most of the code to autoload/
Even without the benefit of color, it's easy to see what's going on. The develop branch was the most recently committed-to, so it's shown at the top. If you follow the line down the left side of the graph, those are the primary commits of the develop
and origin/develop
branches. All the 'merged-in' branches come in from the right. The develop
branch has a merge commit where master
was merged in. You can follow the right side of that merge to the master branch (667a668
), which is also a merge commit, which pulled in Layzie/master
(06826d7
), which was clearly developed on top of afd0e42
(which was clearly on the master
branch (follow the line straight up to see the first-parent commit of master includes it) - all the context necessary is in the graph). That merge commit (667a668
) tells you what branch that was - it was Lazie/master, so 06826d7
was the master
branch from the Layzie
remote repo.
This is where I would suggest a workflow habit - leave commit messages about what was merged in alone when merging. That's how you can - and should, IMO - know what commits were on which branches (current heads being the mechanism for telling which commits currently are on which branches). The merge says "on my left is the destination branch of the branch I'm merging in. On the right is the branch I'm merging in. In my commit message is the name that the branch I'm merging in was called at the time I merged it in" (important point implied here: you can replace branch names at will while working, and I often have).
By not making branch names part of commits you can cherry-pick
and rebase
things around without anything trying to reason about the ever-changing placement of commits, which is just too hard, and not helpful. A commit is just a full-tree snapshot, with some metadata about who created it, when, and where it fits into the dag of interconnected commits. This lets you pull commits from other branches, and even other people's repos. I've used this power to merge completely disparate repos together, interleaving the commits together (the opposite of what I mentioned in my first example). I've used this power to pull in the history of a particular file from another repo onto a new branch, then merge it in, so I could track its development in a more sensible place. Then I threw out the original, junk-filled repo that I didn't want anymore. You can see some examples here of how liquid git's commits are, and how much you can do with them because of it.
If I cherry-pick
in a commit from a coworker's 'dev' branch, I don't care that it was on coworker/dev
. I just want that work in my current branch. The commit metadata will show that I committed it where it now is, and that my coworker authored it, and when we each did those things. Now it's just changes on my branch that aren't trying to figure out that they're on my branch, but used to be on their branch. It would be a mess trying to track that in such a flexible system, and you'd probably end up with incorrect info with a system trying to be that smart (maybe not, but it seems a hard problem). I say let the merges tell you want branches were called, and let git sort first-parent histories to the left for you. I can say that I've had no problem following history for even dozens of branches at a time (and we've had terrible repos at work with so many parallel branches they don't fit across a widescreen monitor).