5

If one I run (for example)

git checkout stash@{0} -- .

...any stashed files that are modified relative to the index show up as staged. Here's a quick example:

% git init demo
Initialized empty Git repository in /tmp/demo/.git/
% cd demo
% date >> file.txt
% git add file.txt
% git commit --allow-empty-message -m ''
[master (root-commit) e46cee5] 
 1 file changed, 1 insertion(+)
 create mode 100644 file.txt
% date >> file.txt
% git stash
Saved working directory and index state WIP on master: e46cee5 
HEAD is now at e46cee5 
% git checkout stash@{0} -- .
% git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    modified:   file.txt

This behavior surprises me.

I can think of two possible directions which an explanation could take (though I can't fill in the details):

  1. The behavior is simply a particular example of a standard, general git behavior;
  2. The behavior is desirable, and therefore it was deliberately added to git, as a feature.

My hope is that the answer is (1). In this case, I would like to know how someone with more knowledge of git's design would have been able to deduce (before the fact) the behavior described above "from first principles." Are there other similar examples of git subcommands that result in files being checked out as "staged"?

If the answer is (2), I would like know why such behavior (checking out files as "staged") is deemed desirable.

If neither (1) nor (2) does justice to what actually happens, I would like to know what does.

kjo
  • 33,683
  • 52
  • 148
  • 265

1 Answers1

6

Case 1 applies, but to understand why, you need to know how stashes are actually stored internally. To use git stash as it was intended to be used, you don't need to know this: that is, I doubt anyone ever envisioned users doing git checkout stash -- .. (Note that stash@{0} is largely a just fancy way to write stash.)

What to know about commits

First, remember that a commit is a snapshot plus some metadata. We won't go into the metadata here, but the snapshot holds a copy of every file.

These snapshots are normally made from the index. The index is an internal Git thing, mostly stored as a file named .git/index, that has several functions, but the main one is that it is where you build the next commit you will make. It starts out holding a copy of each file taken from the current commit. That is, you run:

git checkout master

or similar, and that fills in the index with a copy of each file from the frozen, Git-ified copy of each file in the commit identified by the name master. It also fills in your work-tree with a usable (defrosted and rehydrated, ordinary everyday format) copy of each file. So after git checkout master, you have three active copies of each file.

Suppose, for instance, that one of your files is named README.md and you have just done this git checkout. There are three active copies of README.md now. Two of them are in special Git-only formats and need a Git command to view them:

  • git show HEAD:README.md will show you the frozen HEAD copy of README.md;
  • git show :README.md will show you the index copy of README.md;
  • and README.md is an ordinary file in your work-tree.

You can replace the copy that is in your index at any time. Just edit the copy that is in your work-tree—which is there so that you can see it and work on it—and then run git add README.md. This overwrites the old index copy1 and now the index and work-tree copies match, except that the index copy is in the special ready-to-freeze, dehydrated form that Git uses in commits. Since you changed it, it no longer matches the frozen HEAD copy (which you can't change).

If you also have seven other files, your index now has eight files in it. Seven of these eight are the same as the copies in HEAD; README.md is different. If you now run git commit, Git will package up all eight files that are in the index into a new commit. This new commit becomes the current (HEAD) commit, and now all 24 copies of the files—3 copies of README.md and 21 copies of the other 7 files, in HEAD, the index, and the work-tree—match up again in the usual way.

We say the index because .git/index is a special, distinguished index. Git does have the ability to use other temporary index files, as we'll see. The index is this main one.2


1Technically, a frozen, Git-ified copy of the file goes directly into the repository as a blob object, and then the index just refers to it. Except for speed and Git's own internal convenience, the main effect is the same as if the entire contents of the file were stuffed into the index, though.

2If you use git worktree add, you add a work-tree + index pair, and "the" index for the added work-tree is in a different location. In fact, you also gain an additional private HEAD for the added work-tree. This private HEAD is not in .git/HEAD, just as the added work-tree's principal index is not in .git/index.


The secret of stashes

When you run git stash save—the old verb for creating a stash—or git stash push, Git actually makes two, or sometimes three, commits. I like to call the result a stash bag because of the way Git makes use of the metadata for each of these commits, but mainly we need to talk about the i and w commits here. The third commit in this stash-bag is the u commit, which exists only if you use the --all or --include-untracked flags.

Git makes the i commit from whatever is in your index at the time you run git stash. A later git stash apply, or any of the verbs that use apply, will use this separate i commit if and only if you tell it to; otherwise it just throws it away entirely. I call it the i commit because it saves the state of your (main) index.

Git makes the w commit using a temporary index. Git needs this temporary index because the only way to write a commit is to use some index. It uses the temporary one to avoid disturbing the main one, at least at this point. Essentially, Git copies your main index to this new temporary index, then runs git add on all the files that are in the temporary index, so that they get updated from the work-tree.3 Then Git simply makes a commit using this temporary index, rather than the regular one. The new commit looks almost like any other commit.

There is one other thing that is weird about this w commit: it has two or three parent commits, instead of the usual one. One of the two parents is the current (HEAD) commit. One is the i commit. The third parent is commit u, if it exists—if not, w is a two-parent commit.

A two- or three-parent commit is, by definition, a merge commit. But commit w is not the result of running git merge. This means that a git show of commit w is rarely useful: git show has a special mode for merge commits, which does nothing useful in this case.4 This is why git stash has a show subcommand: git stash show knows how to display the w commit in a more useful fashion, by diffing it directly against the commit that was HEAD when you made it.

This, plus knowing rather more about git checkout, will get us to your last question.


3For efficiency reasons, and because git stash push lets you provide pathspecs, that's not really how this works. But it's useful as a starting mental model, before getting into all the crazy corner cases.

4How useful git show's action is on normal merge commits is debatable as well, in my opinion, but that is another topic entirely.


git checkout is complicated

Let's look at this for a moment, then unpack both the index and git checkout some more. This is especially useful since Git 2.23 introduces two new commands, git switch and git restore.

Are there other similar examples of git subcommands that result in files being checked out as "staged"?

The word staged here is something that git status says. We already noted above that the index—which Git also calls the staging area—contains a copy of every file, and that there are three active copies of each file. Let's go back to the README.md case, and add another file named main.py to our list.

Suppose that all three copies of README.md match each other, and that all three copies of main.py match each other (and that there are no other files, or they all match too). Running git status will say nothing at all about these files. That's because git status runs two separate comparisons:

  • First, git status compares HEAD vs the index. For each file that is different, it says staged for commit. For each file that is the same, it says nothing.
  • Then, git status compares the index vs the work-tree. For each file that is different, it says not staged for commit. For each file that is the same, it says nothing.

Since all three copies of README.md match, and all three copies of main.py match, git status says nothing about them. But if we change the work-tree copy of both files, and then run git add README.md, we now have:

    HEAD            index         work-tree
-------------   -------------   -------------
README.md (1)   README.md (2)   README.md (2)
main.py (1)     main.py (1)     main.py (2)

The numbers in parentheses here indicate which version of the file is where: version 1 is the one that was in the commit, and version 2 is what we updated.

Since HEAD:README.md doesn't match :README.md, git status will call it staged for commit. But the index and work-tree copies do match. Meanwhile, HEAD:main.py and :main.py match, so git status doesn't call it staged for commit—but the index and work-tree version don't match, so it does call it not staged for commit.

What happens if we touch the work-tree copy of README.md again now, so that it's a version 3 that doesn't match versions 1 or 2? Predict what git status will say, then try it out.

This also brings us back to git checkout. The git checkout command is very complicated. It can do about 4 or 5 different things. This is almost certainly too many, and in Git 2.23, the Git folks have introduced git switch, which only does one thing (or maybe two), and git restore, which also only does one thing (or maybe 2 or 3). Good (?) old git checkout is still there and still does everything, of course.

I mentioned this above, but let's emphasize it now: when git checkout switches from one branch to another, it actually copies files from the new commit into the index. It also copies them into the work-tree. The precise way that it does this—and when and how it doesn't do this, in some cases—gets pretty crazy,5 but if you use the syntax:

git checkout <tree-ish> -- <pathspec>

you tell Git that it should unconditionally wipe out uncommitted data that may appear only in the index and/or only in the work-tree: that it should find the file(s) you listed in your pathspec argument, as present in the tree-ish argument, and copy them out over whatever is in your index now, and over whatever is in your work-tree now.

The result is that any uncommitted work you had gets thrown out. Stuff that was in your index and/or in your work-tree is now overwritten, and if that stuff—that file data—wasn't saved anywhere else, it's really and truly gone now.6 But in any case, whether or not you've lost something, now the index and work-tree copies match the copies from the tree-ish you selected. If tree-ish doesn't make any sense to you, read on to the next section.


5See Checkout another branch when there are uncommitted changes on the current branch for the most complicated case, but note that git checkout does more than just this one complicated case.

6Your OS may have some way to get it back from some OS-provided snapshot. For instance, on a Mac, you might have Time Machine making backups regularly. The point here is that Git can't help you any more.


A bit about tree-ish, commit objects, and branch names

The main storage unit of Git is the commit. Git is all about commits: when you make a commit, you freeze a snapshot of your files for all time, or at least, for as long as that commit continues to exist. Every commit has its own unique hash ID, which is a big ugly string of letters and digits that git log will print, for instance.

Inside the commit, though, the files are actually saved in what Git calls a tree object. The commit itself—and its hash ID—represent a commit object, which is actually pretty small, as it only contains the metadata. The snapshot itself is stored under one or more of these tree objects, which also have hash IDs; the commit metadata provides the hash ID of the top level tree object. When you want to get files out of a commit, Git doesn't need the commit metadata. It only needs the tree. So you can give it a commit hash ID, and it will find the tree from the commit; or you can give it a tree hash ID.

There's very little reason ever to bother going to the tree object, but tree-ish is still useful, because the index works a lot like a tree to many internal parts of Git. Hence, many internal places where a tree-ish is needed can also work on the (or an) index. There's no guarantee here, but in general, if a Git command works on a tree, there is probably some variant that works on an index. For git checkout, that's git checkout-index.7 Similarly, git diff mostly compares two commits—or really, two trees—so there is a git diff-index that can use the index.8

Meanwhile, a branch name like master or develop actually has multiple functions. One is specific to git checkout: you can git checkout master to get on branch master, as git status will say. After git checkout develop, you will be on branch develop. But another one is that each branch name identifies one specific commit. The name master therefore stands in for some big ugly hash ID.

You can find the hash ID for any branch name using git rev-parse:

$ git rev-parse master
7c20df84bd21ec0215358381844274fa10515017

In this case, 7c20df84bd21ec0215358381844274fa10515017 is the commit you get when you run git checkout master.

Any given commit can have zero, one, two, or more branch names. It can also have zero or more tag names. Other names, such as remote-tracking names, can and do refer to specific commits. But a special feature of a branch name like master is that it changes over time, and in fact, it automatically changes whenever you make a new commit.

This is what it means to be "on a branch". If you are on branch master, and make a new commit, the new commit gets some new, unique, big ugly hash ID—and now the name master means that commit, the new one you just made. As you make more commits, each new one becomes the commit that master means. This is how branches grow: you make new commits. This is also where the parent metadata in each commit comes in, but we won't go into more detail here.

In any case, a name like master means one specific commit. If you give that name to git checkout, Git will try to check out that one specific commit, and put you on that branch as well, so that new commits will update the name master. But you can use the name elsewhere to mean "the one commit".

The name stash—its actual full name is refs/stash, to distinguish it from any branch name9—similarly just points to one specific commit. In this case it points to the w commit in the current stash.

The general form of names, which all start with refs/, are references. Branch names are refs/heads/*, tag names are refs/tags/*, and so on. The thing that's special about branch names is that they move, automatically, and git checkout can get you "on" them. You can git checkout other names; the result is what Git calls a detached HEAD, at the commit identified by the name.

Names of the form ref@{number}, such as stash@{1} or master@{3}, make use of what Git calls reflogs. Reflogs mainly store the previous values of the reference. The git stash code uses—some might say abuses—the reflog for refs/stash as a sort of stack: popping (or dropping) the current renumbers stash@{2} to stash@{1} and stash@{1} to stash@{0}. Creating a new stash "pushes" it into stash@{0}, bumping all the other numbers up one step.

You could do the same with other reflogs, such as those for master, but that's not how they're intended to be used. Instead, every update just increments all the existing numbers: make two new commits, and what was master@{0} (or just master) is now in master@{2}. Use git reset to remove the last of those and now it's moved to master@{3}; master@{1} holds the commit you just abandoned via git reset.

Most Git commands:

  • manipulate the index and/or work-tree, and/or
  • use or extend the commit graph (see Think Like (a) Git), and/or
  • manipulate references and their reflogs.

The git reset command does all three; git commit uses the index to do #2 (add a commit) and #3 (update the current branch name). The git merge-base command uses the graph to find a particularly interesting commit, without changing anything in the index or work-tree and without modifying any references. A few Git commands—git fetch and git push—have your Git call up some other Git, and give or receive commits and other Git objects to/from that other Git, and then optionally modify your own references (git fetch) or ask them to modify theirs (git push).


7Actually, git checkout-index functionality is mostly included in git checkout at this point. It really is a command with too many operating modes.

8As with checkout, git diff can do this directly. But in this case git diff is a user-oriented porcelain command, with three underlying plumbing commands: git diff-tree, git diff-index, and git diff-files. The plumbing commands are the ones to use when writing scripts, as the porcelain commands have user-configuration settings that make them work differently for different users. Scripts mostly need predictable behavior: it won't do for your script to be tripped up by someone's diff.renames setting, or color options.

9Branch names are names that start with refs/heads/, so if you had a branch named stash it would be refs/heads/stash, which is clearly different from refs/stash. While Git itself can keep this straight, it's a bad idea: don't do that. Humans will get confused, and not know whether stash means refs/stash or refs/heads/stash.

torek
  • 448,244
  • 59
  • 642
  • 775