As jthill said, your working tree or work-tree has only one copy of the file. Git has, in its commits, every copy of the file: each commit has one copy of each file. The copies are de-duplicated, in a clever manner that depends on the fact that nothing in Git, once committed, can ever be changed. So the files inside commits are frozen for all time, along with the rest of the commit (there's a bit of stuff besides just the files).
More precisely, each commit has a full snapshot of the files that you had told Git to put into that commit, at the time you made that commit. Or, if it wasn't you that made the commit, insert some other actor as the person invoking Git commands.
These committed files are in the repository, contained by right of being stored inside each commit. But the files that you see and work with, in your work-tree, are not in Git at all. I think it helps, conceptually, if you think of the work-tree files as yours: you are responsible for these files. The files in commits—the ones in each commit snapshot, made when you or whoever ran git commit
—are the responsibility of Git.
Once you have this in your head—that Git is just copying one set of its files out of a commit, over top of your files—a lot of things fall into place. The remaining rather large surprise is that in an important way, branches don't matter. What matters in Git are, always, the commits. The branch names like master
or develop
are just one way of finding specific commits.
When you clone a repository, or use git push
or git fetch
,1 you're asking your Git to connect to some other Git. So there are multiple copies of each repository. These repositories share commits—by copying them—but they need not share their branch names at all. That's OK, because it's the commits that matter, not the branch names.
1Don't think of git pull
as the opposite of git fetch
, because it's not. Think of fetch and push as the two opposites. Well, ok: they're as close as Git gets to opposites here. Mercurial got this particular terminology right (in Mercurial, pull does what fetch does in Git) and Git just sort of got it backwards.
Branch names don't matter, except to humans
The real name of a commit is its hash ID. To see the hash ID of some commit, use git rev-parse
, whose job is to turn a name into a hash ID:2
$ git rev-parse master
b994622632154fc3b17fb40a38819ad954a5fb88
$ git rev-parse origin/maint
af6b65d45ef179ed52087e80cb089f6b2349f4ec
These hash IDs are how Git finds commits—at least, some specific commit that we humans might care about right now. The name master
is specifically a branch name, while the names origin/maint
or origin/master
aren't branch names. But all of these names locate some commit. Sometimes, more than one name locates the same commit:
$ git rev-parse origin/master
b994622632154fc3b17fb40a38819ad954a5fb88
This is the same hash ID that I got for my master
here. That's no coincidence: the Git repository I cloned has a master
branch, and the last time I talked with that Git repository—a few weeks ago at this point—they had their master
set up to remember commit b994622632154fc3b17fb40a38819ad954a5fb88
. So I told my Git that it should remember b994622632154fc3b17fb40a38819ad954a5fb88
under my name master
, too.
Whenever you use branch names in Git, you're telling Git: Remember this commit hash ID under this name. The special property of a branch name—different from a remote-tracking name like origin/master
,3 for instance—is that if you use git switch
or git checkout
to select its commit, something special happens:
$ git switch dev
Switched to branch 'dev'
$ git switch master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
If you pick a non-branch name, git switch
complains while git checkout
puts you into detached HEAD mode:
$ git switch origin/master
fatal: a branch is expected, got remote branch 'origin/master'
$ git checkout origin/master
Note: switching to 'origin/master'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
...
HEAD is now at b994622632 The eighth batch
Note that git switch
, which is a more-user-friendly command, allows you to get into detached HEAD mode the same way, but only on purpose: you have to add --detach
to the command. Detached HEAD mode has its uses, but everyday work is not one of them, so it's wise to get back in a branch, for your own mental health:
$ git checkout master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
and we're back in the happier state in which Git will remember hash IDs for us, using our branch names. If you don't have Git remember them for you, you will have to memorize these hash IDs, and that is no fun at all.
2Well, that's one of its jobs. Git has a tendency to load too many jobs into too-few commands. That's why Git 2.23 and later have git switch
and git restore
, while earlier versions of Git jam both commands into git checkout
.
3About these origin/*
names: git switch
calls origin/master
a remote branch, but that is a terrible name. Git documentation calls it a remote-tracking branch name, which is slightly better. I use the phrase remote-tracking name, to try to get away from the word branch, which is way too overused in Git. The real key here is to remember that while it's a name, it's not a branch name in the sense that you can't git switch
to it.
Commits remember previous commit hash IDs
The last piece to this particular puzzle is a clever (and/or sneaky) trick. If a commit has a hash ID—and it does—and if that hash ID is how Git finds the commit—and it is—then what happens if we have every new commit we make, remember the raw hash ID of the commit that comes just before it?
That is, suppose we have a string of commits like this, except that they have real hash IDs instead of single uppercase letters:
... <-F <-G <-H
Here H
stands in for the real hash ID of the latest commit. Let's have Git remember the actual hash ID, using the branch name master
, like this:
... <-F <-G <-H <--master
We say that the name master
points to commit H
. But we told Git, when we made H
, that Git should have commit H
remember the hash ID of commit G
! So given that we're working with commit H
right now, Git can just look up the hash ID of G
using commit H
itself. Commit H
points to earlier commit G
.
Of course, earlier commit G
points to even-earlier commit F
, and so on, all the way back to the very first commit. That commit doesn't point backwards, because it can't, so that's where Git gets to stop and rest. Otherwise, if you start Git with the name master
, Git will find H
, then use that to find G
and then F
and E
and so on all the way back to the first commit A
:
A--B--C--D--E--F--G--H <-- master (HEAD)
which is our repository with eight total commits, all in one line.
Branches
Let's say we have this structure at the moment:
...--G--H <-- master (HEAD)
If we now create a new branch name, but let it signify commit H
too, we get:
...--G--H <-- dev, master (HEAD)
We can now attach the special name HEAD
to either branch name. It doesn't matter which name we use because both mean commit H
: the files we see in our work-tree will be the same either way. But let's switch to dev
, with git switch dev
or git checkout dev
:
...--G--H <-- dev (HEAD), master
Now let's make a new commit, in the usual way.4 This new commit gets a new, unique hash ID, which is big and ugly and unpredictable;5 but let's just call it I
.
New commit I
automatically points back to existing commit H
:
...--G--H
\
I
and now Git pulls its really-sneaky trick: git commit
writes the new hash ID into the name dev
, because that's the name HEAD
is attached-to. So the branch name dev
moves, giving us:
...--G--H <-- master
\
I <-- dev (HEAD)
Note how the name master
still selects commit H
, while the name dev
now selects commit I
. If we make another new commit here we get:
...--G--H <-- master
\
I--J <-- dev (HEAD)
Git will now find commit J
using the name dev
, and find commit I
using commit J
. Git has two ways to find commit H
: the name master
finds it directly, and dev
finds it after traversing two hops backwards, from J
to I
to H
.
In Git, the commits up through H
are on both branches. Commits I
and J
are only on dev
. If I
and/or J
contain files that H
doesn't, switching from dev
back to master
will remove these files from your work-tree: you told Git set up my work-tree based on commit H
, and it does that. Switching from master
to dev
brings the files back, because you told Git: set up my work-tree based on commit J
.
If we go back to commit H
and create and switch to a new name topic
, we get:
...--G--H <-- master, topic (HEAD)
\
I--J <-- dev
and now we can create new commits as usual:
I--J <-- dev
/
...--G--H <-- master
\
K--L <-- topic (HEAD)
4I've just glossed completely over the complicated way Git makes new commits, which involves Git's index. I won't go into details in this answer, though.
5Technically, if we know:
- what source files, exactly, will be in the snapshot (all their names and contents);
- what metadata you'll give Git—your name, email address, and so on, and the log message you will use; and
- the hash ID
H
and the exact date and time at which you will make new commit I
;
then we could predict what the actual hash ID of commit I
will be. But how will we predict all of these? So we might as well think of I
as being "random".
Draw graphs!
I flipped dev
to the top row just so that the "bigger letters" K
and L
would be on the bottom. You can draw the graph any number of ways, as long as the connections from commit to commit, the backwards links from J
to I
and the like, are still drawn and as long as you label the correct commits with the correct names. You can leave out some names, and some commits—like the ones before G
—when they just clutter up the drawing.
Whatever you do, though, it's really good exercise to draw a bunch of graphs—on paper, on a whiteboard, or whatever. When you do this you'll notice things, like:
- Branch names find the last commit in a chain. Git calls this the tip commit of the branch.
- The arrows all go backwards. Git has to start at the end, and work backwards.
- If a chain has no name for its last commit, Git can't find any of it.
Knowing these things leaves you in a good position for learning all the other mysteries of Git, such as how git merge
and git rebase
work.