Git history as regular files

Question

I wonder if there is a way in Git itself, or in a text editor, to have Git history displayed as regular files directly in the file tree.

Like if I have a file landing.html with a couple of previous commits, I want to have the full landind.html.old.v0, landind.html.old.v1, landind.html.old.v2, etc. history files directly next to the current file. With maybe the commit message in the file name as well. Like landind.html.old.v0.initial-commit.

Can this be done in Git itself by adding an hook to after-commit? Or, maybe as an extension to a text editor? I use mainly VSCode, and Sublime. I want to avoid to use an additional tool like gitk.

Do you want *your* history (since ... well, that's a problem: since what point?); or do you want a *total* history? If the latter, how do you intend to deal with branch-and-merge in the graph? — torek, Mar 07 '18 at 00:56
total history will be better, but I can settle with only mine and one branch. I want to be able to lookup total history without having to interact with Git. — Hartator, Mar 07 '18 at 05:24

torek · Answer 1 · 2018-03-07T19:28:49.107

There is nothing built in to Git for this, so you will have to write code.

There's an enormous problem with attempting to do this for any particular file right after running git clone, but you added this remark:

total history will be better, but I can settle with only mine and one branch. I want to be able to lookup total history without having to interact with Git.

in which case there's an obvious path forward. I will outline one idea for you, but you will have to write the code. If you know a lot about Git, jump down to the bottom section about using the post-commit hook. If not, read through the rest first. You'll learn a lot about Git by writing the post-commit hook, but you will probably need the other sections too.

First, keep in mind what untracked files are

If you are going to use Git at all, Git forces you to learn about its three parts:

The work-tree. This is pretty simple: it's where you do your work. Files in the work-tree are stored in the usual form, where you can see them and work with them.
The index, which has two other names because it's so important in Git: it's also called the staging area and sometimes the cache. Files in the index are in the special Git-only format. The key here is that you can replace files that are in the index, so they're write-able.
Commits. Commits are permanent, read-only, and incorruptible.¹ Commits in Git are the history: there's no such thing as "file history"; each commit is a complete snapshot, with its contents independent of every other snapshot. Git makes new snapshots by saving (committing) the contents of the index.

An untracked file is one that is not in the index. This is a rare case of Git being simple and clear. :-) If you have a file in the work-tree that's not in the index, it's untracked. All your landing.html.suffix files will be untracked.

¹The permanence of commits depends on their reachability. As noted in the section below on commits, Git finds commits by starting from a branch name (or any other name that identifies a commit). Those commits identify their parents, by their hash IDs, so the parents are reachable from the branch tips. The parents identify yet more parents, so those are also reachable. Git will, rarely (because it takes a long time), compute the transitive closure over the set of reachable commits—really, reachable objects—and compare this to the entire contents of the object database. Unreachable objects may, depending on additional criteria, be garbage-collected (discarded) at this point.

The incorruptibility depends on the fact that they are read-only and hashed. If something somehow changes inside an object, it will cease to match its (cryptographic) hash ID, and Git will know it is damaged.

Some notes about commits

(None of this is directly relevant but it's useful to keep it all in mind.)

Commits, like all of Git's internal objects, are identified (named) by their hash ID. The hash ID of an object, including each commit, is a cryptographic checksum of its contents. The actual contents of each commit is pretty small, because the stored snapshot is done through a separate Git object called a tree: Git turns the index into a tree, then saves the tree's hash ID, plus your commit metadata (your name and email address, some time stamps, your log message, and the commit's parent hash ID) as the commit object.

Branches, and thus the history in a repository, exist because commits store parent IDs. A branch name like master simply holds one (1) commit hash ID. Git calls this the tip commit, and it is by definition the last commit on the branch, i.e., the newest. To find a history, Git looks at the tip commit's parent commit, which is the second-to-last. Then Git looks at the parent's parent, which is the third-to-last; and so on. The resulting chain-of-commits is thus the branch, as found by the branch name, which identifies only the tip-most commit:

        D--E   <-- master
       /
A--B--C
       \
        F--G   <-- develop

Commits A through E are all on branch master, and commits A through C plus F and G are all on branch develop. Note that some commits are on more than one branch. The history stored in the repository is simply the sum of all the commits stored in the repository. Note that the names, master and develop here, identify only one commit each.

You could, if you wanted, make a repository with a single linear branch in which every commit is completely unrelated to the previous commit. More usefully (but still deliberately perverted), you could make a repository where every other commit has a different project in it, so that if you check out the first commit, you get Project A's initial attempt. If you check out the second commit, you get Project B's initial attempt. The third commit is the second commit of A; the fourth commit is the second commit of B; and so on. In other words, an even-numbered commit N is Project B, commit N/2; an odd-numbered commit is ProjectA, commit floor((N+1)/2).

The key point here is that commits are not change-sets. If the same file appears many times in a row in many commits in a row, each commit has its own independent copy of that file. It's true that somewhere, deep down in Git's underbelly, they all share a single "true copy" of the file (and for identical objects this turns out to be really easy for Git to do; for slight variations, Git has to put the objects into what it calls a pack file to delta-compress them).

What this really means is that in order to talk about things that have happened to a file, or to some set of files, you must pick some commits to compare, one pair of commits at a time. The obvious thing to do is to compare each parent/child pair. This works as long as the commits are linear:

... G--H--I--J   <-- develop

Here, the G-H pair, the H-I pair, and the I-J pair make for useful comparisons. But suppose this is part of:

        D--E
       /    \
A--B--C      M   <-- master
       \    /
        F--G--H--I--J   <-- develop

where commit M is a merge commit on master, where someone merged develop into master at that point. Commit M has two parents, not just one: will you compare M to E, or to G? Meanwhile, the branches forked apart at C, so C has—at the moment; we could add more any time!—two children. Will you compare C to D, or C to F? These are the really sticky parts, which you can avoid by "settl[ing] with only mine and one branch".

Making commits

As you no doubt already know, the process of making a commit consists of doing the following steps:

Check out some branch name: this makes its tip commit be the current commit. There are some important facts about this: in particular, how this affects the index and work-tree. We'll get back to this in a moment.
Make changes in the work-tree. The files in the work-tree have their ordinary read/write form , so this is pretty easy.
Run git add. What this really does is to copy the updated files from the work-tree into the index, replacing the un-edited index files.
Run git commit. This collects your commit log message, then makes the actual commit object.

The tricky part of making the commit is turning the index into a tree object (for which there's a separate command, git write-tree, that you can run if you want to do it all manually). Once Git has the tree object, it can write out the text of the commit:

tree <hash>
parent <hash>
author <name> <email> <timestamp>
committer <name> <email> <timestamp>
<log message>

and then turn this into a commit object (you can do this part manually too, if you like, using git hash-object -w -t commit). Creating the object creates the hash ID for the object, by computing the cryptographic checksum of the text. As long as this commit is different from every other commit—and the timestamps plus the rest of the contents ensures that it is, since the time is always increasing²—it gets a new, different-from-every-other-commit hash ID. Note that the parent <hash> line uses the hash ID of the current commit—the one you checked out in step 1.

Git then simply writes the new commit's hash ID into the branch name, so that the current branch—the one you checked out in step 1—now identifies the new commit as its tip. Last, and this is where you will be able to do what you want, git commit runs a post-commit hook.

The above can be confusing, so let's draw an example, with a simple three-commit repository becoming a four-commit repository:

A--B--C   <-- master (HEAD)

The name master points to commit C. You git checkout master, make some change, git add and git commit and create new commit D. The new commit points back to C as its parent:

A--B--C   <-- master (HEAD)
       \
        D

and then Git quickly slides the name master down-and-right, as it were, so that it points to the new commit D:

A--B--C
       \
        D   <-- master (HEAD)

after which we generally straighten out the drawing so that it looks like a simple line again.

Note that you can run git commit --amend, which makes the new commit have the current commit's parent as its parent. That is, instead of having D point back to C, we can have D point back to B:

A--B--C
    \
     D   <-- master (HEAD)

This makes the history go D -> B -> A, skipping C (which has become unreachable and will eventually be garbage-collected). In other words, we haven't actually changed history—C is still in there, it's just no longer in our history linkage—but it looks like we have. If you will ever use git commit --amend, keep this in mind in your Git hooks later.

(Git's git rebase has a similar effect, but considerably more drastic: it copies multiple commits to new commits, abandoning the originals.)

²If, by trickery and subterfuge (or by just running git filter-branch which uses trickery and subterfuge), you manage to make a new commit that is bit-for-bit identical to an existing commit—it has the same author and committer, the same timestamps, the same parent, the same source snapshot, and the same log message—then you will re-use the old commit's hash ID. But so what? You just made a new commit that's exactly the same as the old commit. It has the same author, was made at the same time, has the same history, and has the same log message. It is the old commit.

There's an oddball case here with making two identical commits very fast (within one second) on two different branch-name checkouts when both branch names point to the same tip commit. This causes the branch names to wind up pointing to a single, shared new commit, even though you expected them to point to two different commits, and they would have if the process had spanned a clock-tick. The result is correct, in a graph-theoretical sense, and works; but it is surprising.

Filling in blanks, or rather, filling in the index and work-tree

I mentioned that step 1 above—the git checkout branch-name step—has an important effect on the index and work-tree. Note that when Git made the new commit above, it started by writing out the index to make a tree object, using git write-tree. This means that the index must start out matching the current commit.³

The git checkout command achieves this by comparing the current (pre-checkout) commit to the target (post-checkout) commit. The current commit has some set of files, and the target commit has another set of files, presumably at least a little different. Checkout will remove, from the current index and work-tree, those files that must be removed. It will add into the current index and work-tree any files that must be added. It will replace, in the index and work-tree, any files that must be swapped out, to go from the old commit to the new one.

As a result, after git checkout, the index and work-tree will—except for untracked files, that aren't in the index at all—match the target commit, which has just become the current commit.

Note, too, that when you run git commit, this makes the new commit using the current index. The result is that once the new commit is done, the current commit and the index match again. So we get a basic (although slightly flexible, see footnote 3) truth about Git: The index normally matches the current commit, up until you start git adding to copy files from the work-tree.

³Actually, some difference is allowed to carry over across checkouts. See Checkout another branch when there are uncommitted changes on the current branch for details.

Using a post-commit hook to get what you want

Git runs your post-commit hook right after git commit finishes successfully. This git commit has made a new commit, such as commit D in our example of turning a three-commit repository into a four-commit repository.

The new commit has a parent, such as C. Now you have a chance to compare parent to child:

git diff --name-status HEAD^ HEAD

for instance. (HEAD is the current, i.e., child, commit, and HEAD^ means look at the first parent of HEAD. Keep merge commits, which have multiple parents, in mind here: you can use HEAD^2 to look at the second parent of a merge, for instance. I'm not sure, off-hand, whether git merge runs the post-commit hook, when git merge makes a merge commit, although I suspect that it does.) The output from git diff --name-status tells you what happened to each file that it prints; see the git diff documentation for details.⁴

At this time, if some file such as landing.html has changed (status M), or a new file has been created (status A), you can make a copy of the file under the next version number, and using the commit log message subject (git log -1 --pretty=format:%s HEAD). If the file hasn't changed, you get no output—git diff says nothing because there's nothing to say—so you make no copy.

The result, over time, is that you will build up, in your work-tree, the untracked files that you want as your history, numbered by the order in which you make these commits. To make the numbering mean something, you can even check which branch you're on (if any—in "detached HEAD" mode, such as when you are looking at historic commits, HEAD is not attached to a branch name at all). Note that you can use git rev-parse --abbrev-ref HEAD or git symbolic-ref --short HEAD to get a branch name.⁵

⁴For scripting, you should really use git diff-tree, which is more predictable. It doesn't obey per-user configuration controls, for instance, so it behaves the same for everyone. git diff will look at your diff.renames setting, your diff.renameLimit, and so on, as well as diff-output coloring options, all of which can mess with scripting.

⁵The difference between the two is that git symbolic-ref will fail (exit nonzero), and produce no standard output (but will write to stderr by default), if HEAD is detached. git rev-parse will just print HEAD for this case.

Waho, thanks for the very complete answer. I always `Git` is so complicated, it never really completely clicked with me. I'll implement a complete solution this weekend. — Hartator, Mar 07 '18 at 22:16

Mark · Answer 2 · 2020-04-06T06:20:52.940

The Timeline view - built-in, not an extension - seems to do what you want. Pretty sweet, focus on an open editor and that file's commit history is listed. Click on an entry and a diff opens. It opens a diff to the current version - you would have to copy all of the diff and save to a new file if you want separate files to persist.

vscode v1.44 update

The Timeline view is now out of preview and enabled by default (emphasis added). This is a unified view for visualizing time-series events (for example, Git commits, file saves, test runs, etc.) for a file. The Timeline view automatically updates showing the timeline for the currently active editor, by default. You can control this default behavior by toggling the eye icon in the view toolbar. Also, similar to other views, the Timeline view supports find or filter as you type.

In this release, the built-in Git extension contributes a timeline source which provides the Git commit history of the specified file. Selecting a commit will open a diff view of the changes introduced by that commit. While a context menu, also provides Copy Commit ID and Copy Commit Message commands. There is also a new Open Timeline command on an Explorer file's context menu, to quickly show the timeline for the selected file.

from https://github.com/microsoft/vscode-docs/blob/vnext/release-notes/v1_44.md#timeline-view

Preview of related functionality in v1.42 and v1.43: https://github.com/microsoft/vscode-docs/blob/vnext/release-notes/v1_42.md#timeline-view and https://github.com/microsoft/vscode-docs/blob/vnext/release-notes/v1_43.md#timeline-view

Timeline view

In this milestone, we've made progress on the new Timeline view, and have an early preview to share. This is a unified view for visualizing time-series events (for example, Git commits, file saves, test runs, etc.) for a resource (file, folder). To enable the Timeline view, you must be using the Insiders edition and then add the following setting:

"timeline.showView": true [enabled by default in v1.44]

In this early preview, the Timeline view shows the Git commit history of the active document, which is currently limited to 32 items. Clicking on one of those commits, will open a diff of the changes introduced by that commit. Extensions will also be able to contribute their own timeline sources, which will be shown in this unified timeline view. Eventually, you will also be able to select (filter) which sources you'd like to see in the view.

Stay tuned, we have much more in store for this new feature. You can follow along by subscribing to #84297 and by watching for issues tagged with the timeline label. And if you have ideas on other types of information you'd like to see in this view, let us know!