This whole thing is a little tricky (and is one reason I never use Git for my home directory—I store my dot-files in Git, for instance, but do this via symlink or copy, so that the repository is not in my home directory).
It's important to remember, when dealing with Git, that there are three places for files. One of the three is a permanent store, a sort of database in which your files are kept frozen for all time and you can always get them back.1 These are the commits. No part of any existing commit can ever be changed, and that includes the files that are stored as full snapshots within each commit. (See footnote 1.)
But there are two other places that Git stores files, and that matters. It's particularly important when you realize that the frozen files inside Git are in a special, compressed, Git-only format. So these files are completely useless to anything that's not Git, and moreover, you—and even Git—can't change them. That means Git needs to have a way to take the files out of the commits, and put them back in their normal, everyday, usable form.
That way—and that place—is your work-tree. Your work-tree holds the everyday usable form of your files. Git will copy files out of commits, overwriting whatever is in your work-tree, as needed. It will also remove files from your work-tree when that seems appropriate.
What this means is that files in your work-tree are not permanent. But they are useful—unlike Git's special frozen copies—and it's very tempting to make your work-tree not just your "play" area but your practical, every-day, get-things-done area. There's nothing inherently wrong with this, but you have to accept the idea that Git will then be able to reach in and fuss with this area, and understand when, how, and why Git does this.
In between the commit database and your work-tree, Git stores a third copy of each file. This third copy is in the frozen format, but isn't frozen. This third copy is what Git calls, variously, the index or the staging area. Git makes all new commits from its index.
As with the work-tree, things in the index are not permanent. In fact, they must not be permanent: if they were you couldn't change them! But, being impermanent, if you remove them, they really are gone. That's true for both the index copies of files, and for the work-tree copies.2
When you run git commit
, Git simply packages up all the files that are in the index right then and makes a commit out of them. (The commit also adds your name and email address, the date-and-time stamp, and so on, but the files that are now stored permanently, frozen into the commit, are those from the index.)
You can, at any time, run:
git rm --cached <file>
to tell Git: remove the given <file> from the index, while not removing it from the work-tree. The file is now, at this point, not in the index. But it may be in some existing commits. You can now make new commits and the new commits won't contain the file—because it's not in your index.
If you check out some old commit, though ... well, git checkout
works by:3
- copying the commit to the index, then
- copying the index to the work-tree
so that you have all your files from that commit available, both in the index (for a new commit that you might make in a moment, based on the old commit) and in the work-tree (so that you can see and work with your files).
Suppose you have a file named F
in that commit, that's not in your index before the git checkout
, but is in your work-tree. You then run git checkout <that-commit>
, and now F
is in your index, and also is in your work-tree. The work-tree copy now (probably) matches the commit's copy (see footnote 3 about "probably"). If you now decide that you're done looking at the old commit, and use git checkout master
to get back to modern times, Git will remove file F
from your index ... and therefore also remove file F
from your work-tree.
File F
, not being in your index, goes back to being untracked, except for the huge problem that it's gone from your work-tree, so it's not even there: its untracked-ness is irrelevant. There are copies of F
in old commits—it's in the commit you had out a moment ago after all—and maybe those are good enough:
git show <hash>:F
will let you view the copy that's in the given commit, and you can redirect its output to F
to re-create it in the work-tree.
There is no perfect cure for this problem—well, except for one: move the Git repository elsewhere. Make this work area not be a Git work-tree. Git won't control it in any way anymore, and therefore won't remove or clobber files that are in it. The main real problem is historic commits that have files that you want them not to have. You cannot change those commits. You can copy them to new-and-improved commits that don't have the files. See, e.g., Remove sensitive files and their commits from Git history and everything in the tag bfg-repo-cleaner, for instance. If you do use these, make sure you don't mix the new repository, with its new and improved commits, together with the old-and-lousy repository with the bad commits: Git is geared to glomming all commits together and will "want" to leave you with a repository that has both sets of commits.
1Technically, you can get rid of a commit. You do this by making it so that you cannot find the commit by any branch or tag name or any other name. Git will notice that the commit cannot be reached—see Think Like (a) Git for a good discussion of reachability—and will eventually throw it out of the database, and then those copies of the files will be gone. But all the reachable commits will remain, and they will hold copies of the files as well.
Because the files in each commit are frozen, commits get to share them when they are unchanged. So if you make a million commits, all of which store a 4000 line file, but the 4000 line file is the same in all the million commits, there's really just one copy of the file, even though there are a million commits that all have that copy. So the fact that every commit stores every file doesn't make the commit-database explode in size: they store via sharing.
2There may, or may not, be some way outside Git to get work-tree files back. For instance, MacOS offers its Time Machine. But Git doesn't know how to work that, and this is about what Git does.
3This is very much simplified. See Checkout another branch when there are uncommitted changes on the current branch for most of the gory details. There's even more to know if you have filter drivers defined.