You have an incorrect mental model of how Git works. (Don't worry that you do—I did when I started with Git, more than a decade ago.) To correct your mental model, you need to know these things:
Git stores commits. It does not store files—not at the level you will use it, anyway—but rather whole commits.
Commits themselves do store files, so that's how you get files, but it's at the level of a commit: you either have a commit (and all of its files), or you don't (you have none of its files). Every commit stores a full and complete snapshot of all files (well, all of its files; see below).
Commits also store some metadata: information about the commit, such as who made it, when, and why (a log message). A crucial piece of metadata in each commit is the commit-"number" of the commit that comes before this commit.
Commit "numbers" are big and ugly and random-looking hash IDs. Every commit gets a unique hash ID. This is how you (or your Git) knows whether you have the commit. Every Git everywhere agrees that that particular commit gets that particular hash ID, and no other commit, past or future, can ever have that ID. To make this work, the hash ID is a cryptographic checksum of the contents of the commit—which means that no part of any existing commit can ever change.
No human can actually remember these hash IDs. Fortunately, we don't have to: we have a computer to remember them for us.
A branch name, which most people (including me) will often abbreviate to "a branch", holds just one hash ID. The hash ID in a name like this is the ID of the last commit in the branch. That's why each commit links back to its parent, or previous, commit: so that Git can start at the end and work backwards.
A collection of commits that you get by starting at the end and working backwards is also called "a branch". So when someone says branch master
, for instance, it's important to think about whether this means the last commit in master
as stored in the name master
or a series of commits ending with the last commit in master
.
Now, the fact that every commit ever made is read-only means that what we do with a repository is generally just add new commits. But to make a new commit, we have to be able to change files: open them up in our editors, make changes to them, and save them back. The files inside commits can't be changed. So we do not, and cannot, work on committed files. The commits themselves, that hold snapshots of all of your files, are just archives.
To keep the archives from growing very fat very fast, Git stores committed files in a special, read-only, Git-only, compressed format. Only Git itself can actually use these. (You could of course write your own programs to read them, but there's more than one format, and there's already a Git plumbing command, i.e., something users aren't supposed to have to use, to read a raw object, using git cat-file -p
. This can read more than just files, but it can read the files inside a commit.) New commits can share the files from existing commits—that's obviously safe because they're all read-only—and in fact, this all happens automatically.
In any case, to get any new work done in some existing repository, you must first pick some existing commit and have Git extract it somewhere. That "somewhere" is your work-tree (or working tree or some variant on this name). The extracted work-tree area contains ordinary files, in ordinary everyday formats.
You, and your computer, can work with these work-tree files. That's what you are doing in your steps 2 and 6, for instance.
Git does not use these work-tree files very much at all. It creates them for you (by extracting them from commits), and it will look at them when you tell it to, but it's not using them to make commits. They exist for you to use, to get your work done. You have to copy them to the files that Git is using, which is what step 3 was about. This is where everything gets a little complicated.
The index
In step 1, you created a new, empty Git repository. This repository has no commits yet. It has an empty work-tree, in which you can work with your files. And, it has an empty index. This thing—this index—is kind of complicated, but you can think of it as where you build the next commit you will make. You can think of it as holding copies of each of your files.
Your step 2 was:
touch file1.txt file2.txt
which created two (empty) files in your work-tree. These files are not in your index yet. Your step 3, though, was:
git add file1.txt file2.txt
This has the effect of copying the files' contents into the index.1 Git now says that these files are staged for commit. This leads to another, alternative name for the index: it's also called the staging area. These are just synonyms: the index, or the staging area, is just one thing.2
Finally, in step 4, you ran git commit
. This made a new commit from the files that were in the index, not the ones in the work-tree. Those two index files were copies of the ones from the work-tree.
At this point, you now have a commit. This one commit is the very first commit in the repository, so it's a bit special: it does not record any previous commit. (It can't, of course; there are no previous commits.) I have no idea what hash ID your commit got: it depends not only on the files that are in the commit (which I do know) and your log message (which I saw in your command), but also on your name and email address and on the very second at which your Git created the commit (and I don't know these). I do know, though, that it has a unique hash ID, different from all the other hash IDs in your repository, or any other Git repository you'll have your repository talk to in the future.3
1Technically, the index holds the files' modes, their names, and—for each file—a reference to the internal Git object that holds the content. This blob object has a hash ID, like a commit (though unlike a commit, a blob object can be re-used). The hash ID of the empty file is e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
, which you can find by running git hash-object -t blob --stdin </dev/null
. If and when Git moves to SHA-2 instead of SHA-1, the IDs of every object will change, which is going to be a very interesting time for Git. We can hope that Git hides all the painful parts here for us.
2Technically, the index is mostly just a file in .git
named .git/index
. The "mostly" is here only because Git has a mode called a split index. All of these, however, are internal details that could change. The one external promise is that you can set an environment variable named GIT_INDEX_FILE
to make Git use a different index. Some Git programs do this for special purposes: e.g., git stash
, when it was a shell script, did it when making some of the stash commits, to avoid overwriting the normal index.
3This depends on the uniqueness of hash IDs. In the presence of malicious actors, that in turn depends in part on the strength of the cryptography. See How does the newly found SHA-1 collision affect Git?
More about branch names
We already mentioned that branch names, like master
, hold the hash ID of a commit. Until you have some hash IDs, you can't have any branch names. So creating this initial commit is what created the name master
. This name holds the actual hash ID, whatever that is. When something holds a hash ID, we say that this something points to the commit. So at this time—after step 4 creates the first commit—you have a commit with some big ugly hash ID, but let's just call it "commit A
", and draw it like this:
A <-- master
The name master
points to (contains the hash ID of) commit A
.
Now we go on to step 5:
git checkout -b myBranch
This creates a new name, myBranch
, that also holds the hash ID of existing commit A
. Let's update our drawing:
A <-- master, myBranch
Git also needs to know which branch name we're using, so let's attach the name HEAD
, written in all uppercase, to one of these two branch names. The branch name we want to use—created by this git checkout -b
—is the new one, so that's:
A <-- master, myBranch (HEAD)
Both names point to the same commit. This is perfectly normal in Git: commit A
is now on both branches. The current name is myBranch
and the current commit is commit A
.
Now let's watch what happens in steps 6, 7, and 8:
rm file1.txt
This removes the file from your work-tree. Git's index, which still matches commit A
—Git made commit A
from the index—still has two files in it.
git status
This runs two separate comparisons. One compares the current commit, commit A
, to the index. These have the same files with the same contents, so this part of git status
says nothing. The second comparison is index-vs-work-tree. Here, the index has file1.txt
and the work-tree doesn't, so this comparison says that file1.txt
is removed from the work-tree but not from the index, by saying that this deletion is not staged for commit
.
git checkout master
This tells Git that you'd like to change the current commit and/or branch. The current branch is myBranch
and the current commit is A
. The selected branch name is master
and its commit is A
. So Git can skip changing commits, while sticking the special name HEAD
to the name master
now:4
A <-- master (HEAD), myBranch
Nothing has happened anywhere else: the index still has two files, the current commit is still commit A
, and the work-tree still has one file missing. Step 9—another git status
—will tell you that your current branch is now master
, but will do the same comparisons: commit A
vs index, and index vs work-tree. The result here will be the same. Step 10 just looks at the work-tree, which we know is missing file1.txt
.
Step 11 asks Git to attach HEAD
to master
again. Nothing else changes: the index is untouched, and the work-tree is untouched.
In step 12, though, you run:
git rm file1.txt
This changes the index. The git rm
command removes the file from both the index and the work-tree. It's already gone from the work-tree, so that doesn't really change anything, but now the index no longer has a file1.txt
in it.
In step 13, you run git commit
again. This makes a new commit, from what's in the index: that is, a commit that has just the empty file2.txt
in it. You get all the usual metadata as well: your name and email address, and the log message for why you made this commit. The parent of this new commit, which we'll call B
rather than trying to guess a hash ID, is existing commit A
: new commit B
points to existing commit A
.
The last step of git commit
is for Git to write the new commit's hash ID into the name to which HEAD
is attached. Since step 11 attached HEAD
to myBranch
, the result is this:
A <-- master
\
B <-- myBranch (HEAD)
The existing name master
has not changed at all. HEAD
is still attached to myBranch
, but the name myBranch
now points to new commit B
. The index still has whatever it had from before you ran git commit
: i.e., it has just the empty file2.txt
in it. Commit B
has a backwards-pointing arrow to—or really, contains the hash ID of—commit A
, so if you run git log
right now, your Git will start at HEAD
, find myBranch
, find B
, show commit B
, follow the arrow to commit A
, and show commit A
.
4Technically Git accomplishes this by writing the branch name master
into a file in .git
named .git/HEAD
. You can look at this file, but when you want to update it, you should use the various Git tools, because under various conditions, Git might be using some other file. In particular, since Git 2.5, Git now has git worktree add
, which adds a new index-and-work-tree pair. Each added work-tree has to get its own separate HEAD
as well, so once you add some work-trees, the index isn't always .git/index
any more and HEAD
isn't always .git/HEAD
any more.
Summary
Keep the following items in mind at all times:
Git is all about commits. Branch names—and other names, once you get to that point—just serve to find the commits.
Every commit has a unique hash ID, and except for some new unfinished features ("partial clones"), you always either have a full commit, or none of a commit.
Every commit links back to one or more predecessor or parent commits, except for special cases like the very first commit ever in some repository. These linkages—or chains of commits—form what people call branches (one of the several meanings of the word "branch").
To make a new commit, you need to update Git's index. When you first git checkout
some commit you don't already have out, Git will fill in the index—and of course your work-tree—from that commit. You work with files in your work-tree, and Git works with its index.
The index and your work-tree aren't copied around: when you git clone
, or git fetch
, or git push
, you will transfer commits. The index and work-tree don't matter here (well, there are some conditions for git push
, in the other Git, that's receiving your git push
).
Commits are frozen for all time (and mostly permanent—they're a bit hard to get rid of, even if you want to, sometimes). The copies of files in your index and work-tree are temporary.
Adding new commits updates your branch name(s). The branch name that gets updated is the one you've attached HEAD
to.
In Git 2.23 or later, you can use git switch
to pick where HEAD
goes and/or create new branch names, and git restore
to extract specific files from specific commits; in earlier versions of Git, both jobs are stuck into one git checkout
command.
When you get to the point of using a second Git repository, remember that until you git push
those commits to that other repository, your Git is the only one that has your new commits. That makes it easy (and OK) to "rewrite history" by replacing some commits with some new-and-improved versions (e.g., git rebase -i
or git commit --amend
). Once you have sent the commits elsewhere, you can still replace commits with new-and-improved versions, it's just the other Git now has the commits you sent earlier, so these things get harder—sometimes a lot harder.