To understand how this all really works, and when and why Git removes files, you must hold three ideas in your head simultaneously:
Git stores, in a repository, commits.
Each commit is a snapshot of a work-tree—sort of; see below. It's complete in and of itself. Once you make a commit, it never changes. It's designed (mathematically) to be impossible to change: the identity—the "true name"—of any commit is a cryptographic checksum of the contents of that commit. This means that if you change even a single bit of any file within a commit, or add any new file or remove any old file, what you get is a new, different commit, with a different name.
A Git repository has a work-tree.
The format of files inside commits is something only Git itself can use. If Git never let you edit, view, and otherwise use your files, it would be useless. So each repository has a work-tree, which is basically a place where Git has expanded those files into their normal form, so that all the rest of the programs on the computer—and you yourself—can use them.
A Git repository has an index.
Work-trees and commits are quite different, but you can convert a work-tree into a commit, or a commit into a work-tree. Git's index is the intermediate place "between" work-tree and commit. Git is unusual in exposing this thing: other version control systems sometimes have something that is like Git's index, but most keep it hidden. Git does not.
In any case, the index is the key to all of this.
If you are writing new commits, the best way to describe, and think about, the index is that it is where you build the next commit. There are a bunch of reasons for this, some better than others, and many having to do with speed (of extracting old or making new commits). There are, however, several key features that Git gives you via the index, which forces you to know what it is and how to use it.
In particular, in a lot of computing systems, we want or need to keep, in the work-tree, files that will never be committed. For instance, with compiled languages, we have the source code, and then the compiler output files. Projects may have site-specific configurations. There are a lot of good reasons to want to keep un-versioned files mixed in with the versioned files (in some cases, they may even be versioned, but separately from the source—Git is not very helpful here though).
Hence, the index is kind of a go-between, sitting between the permanent commits and the temporary but useful-to-things-other-than-Git work-tree. Besides just letting you stage files for the next commit, though, the index keeps track of which files you have extracted from the current commit. (More precisely, it keeps track of which version of each file you have, which is one of the ways Git manages to be as fast as it is.)
Here's the answer about what gets removed
When you move from one commit to another—as you do with git checkout
of some commit other than the current one, and the so-called fast-forward variety of git merge
—Git will remove from your work-tree any file that:
- is in the index
- but is the wrong version of the index for the new commit
It will then add to your work-tree any file that:
- is in the new commit
- but is not already at that version in your index
In other words, the index not only lets you build the next commit, it also remembers what you have in your work-tree. If you move from (previously current) commit badc0ffee
to (newly current) commit faceacafe
, and your index says that you have version deadc0de3
of file zorg.py
that went with badc0ffee
but new commit faceacafe
has no zorg.py
, Git will remove zorg.py
.
For all this to work, your index must match your current-before-you-change-it commit.
Fancy GUI front ends may hide, or try to hide, the index from you. This is usually a mistake since it's so central to proper Git operation.
Some extra side notes
The above glosses over the protections that Git gives you about checking out commits while you have modified files (called a "dirty work tree" or "dirty index"). Assuming you don't do this—you never modify some work-tree files and then, deliberately or accidentally, fail to stage (git add
) and/or commit them—your index will always match your current commit. To change commits, Git changes the files in the index; and in the process, it changes those files—and only those files—in the work-tree.
If you do deliberately set up a dirty index and/or work-tree, Git will try to let you change commits anyway. This succeeds only if the new commit has stored in it the same versions of the same files as the ones you have "dirtied". This works precisely because Git only updates those files that are, as it were, "wrong in the index" for the new commit. For (much) more on this, see Git - checkout another branch when there are uncommitted changes on the current branch.