The thing to remember here is that "branches"—more specifically, branch names—don't really mean anything to Git. They mean things to you, but as far as Git is concerned, each name is just a method of finding one specific commit. What Git really cares about are the commits in the repository.
Each commit has its own unique big ugly hash ID. You see these IDs in git log
output for instance. The true magic of Git is that every Git repository everywhere will agree that, if one specific commit in your repository—or one in theirs—has some hash ID H, no other commit anywhere can ever have that hash ID.1 So when two Git repositories "meet up", by git fetch
or git push
, they need only compare hash IDs. Your GitHub repository, over on GitHub, has some commit with some hash ID H listed under their name master
. Your Git calls up their Git. Their Git says: My name master
corresponds to hash ID H. Your Git checks: Do I have H? If your Git has H already, it's done. If not, your Git asks their Git to send commit H.
Your own Git has your own names. One of them might be master
. It has some hash ID. It doesn't matter what hash ID your master
has here, the only thing that matters to your Git for the fetch
operation is: Do I have commit H at all? Your Git can always look up all of its internal Git objects directly by raw hash ID. You either do have H already, or you don't. Since no Git anywhere can ever use hash ID H for anything but this commit, all your Git has to do is check that one thing.
If you don't have H after all, your Git will have their Git send commit H. Now, another thing about every commit is: each commit records some set of parent commit hash IDs. The parent or parents of a commit is the commit (or for a merge, two or more commits) that comes "just before" this commit. That is, given a long string of commits, made one at a time, each one stores the previous comit's hash ID, in a backwards-pointing chain:
... <-F <-G <-H
So if you're going to get H, their Git will now offer you G. Your Git asks itself: Do I have G already? If not, your Git says OK, give me that too. If you do have G, your Git says No thanks, I have that one already. This repeats for every commit they have that you don't. Eventually, git fetch
has a list of all the commits it must send, and all the commits that you have that this list of commits will extend.
At this point, their Git packages up the desired subset of their commits—and associated snapshots and so on—knowing which commits you have because your Git told them I already have that one. Their Git can compress all the files and such, everything that's inside the commits, using this information. So you end up getting a much smaller data-set over the network than if they just sent you everything.
1Technically, two different commits can have the same hash ID, but only if they never "meet". That is, if you connect your Git to some other repository, and it has some commit with hash ID H and your Git has a different H, both Gits will believe this is the same commit and neither will send it to the other. As long as your Git and their Git never meet and try to exchange commits, this causes no problems. In practice, this kind of hash collision doesn't become even remotely likely until you have more than about 1017 objects. At that point, it gets to a similar chance as having your computer's storage system just randomly fail on you, which is similarly disastrous. It can be a problem if someone carefully engineers a hash collision. For details, see How does the newly found SHA-1 collision affect Git?
Git uses names to find last commits
We drew a simple chain of commits above, ending at commit H
:
...--F--G--H
(where the letters stand in for the real hash IDs, which look completely random, but are actually completely deterministic). Given the ID of the last commit H
, we just have Git look up H
. Inside H
, Git finds hash ID G
, which lets it look up G
. Inside G
, Git finds hash ID F
, which lets it look up F
, and so on. This lets Git go from the last commit all the way backwards to the very first commit ever.
This even works in the presence of branching commit structures:
I--J
/
...--G--H
\
K--L
There are now two last commits. J
is the last commit in one structure, and L
is the last commit in the other. The two structures—should we call them branches?—meet as they travel back to H
, and then they share commits all the way back to the beginning of time (presumably commit A
).
In a real repository we might have thousands, or millions, of commits. There's a big old jumble of hash IDs. How will you, or Git, quickly find the last commit? You can—and in maintenance commands, Git does—list out every commit, and figure out which ones are "last". This takes a while: multiple seconds, or in really big repositories, sometimes minutes. It's clearly way too slow. Plus, who wants to work with hash IDs? Humans certainly don't.
So, Git offers us the ability to use a name to remember one (1) hash ID. We can pick the name branch1
to remember hash ID J
, and the name branch2
to remember hash ID L
:
I--J <-- branch1
/
...--G--H
\
K--L <-- branch2
We can, if we want, use the name master
to remember hash ID H
:
I--J <-- branch1
/
...--G--H <-- master
\
K--L <-- branch2
It doesn't matter that there are commits after H
. H
is the last commit in master
. That's the entire definition of a branch. That's it: a branch name is just a pointer, in Git; it's just a way to hold one hash ID, and by definition, whatever hash ID the name holds, that's the last commit in that branch.2
So branch branch1
ends at commit J
, and automatically includes every commit that you can get to by starting at J
and working backwards. Branch branch2
ends at commit L
and includes all commits before L
, with Git again working backwards. Git always works backwards. If it needs to work forwards for some reason, it does that by first working backwards and remembering hash IDs as it goes, and then going forwards through the remembered list. And, commits can be, and very often are, on more than one branch.
When your Git gets new-to-you commits from their Git, your Git needs to set up some name(s) to remember the hash IDs that their Git had in their branch names. But their Git tells you this stuff right at the start of the git fetch
. You run git fetch origin
, and the Git over at origin—at the URL listed under the name origin
—says: My master
holds H, my develop
holds L, .... Your Git just got this whole list.
Then, as the fetch runs, your Git selects any commits they have that you don't, and has their Git send them over. This adds new commits to your repository, without removing any commits—it literally, physically can't remove any commits as they only send you new (new-to-you) stuff. When all of this is done, you definitely have those commits.
So now your Git takes all of their branch names, and renames those names. Your Git turns their master
into your origin/master
. Your Git turns their develop
into your origin/develop
. This goes on for all of their branch names. These are Git's remote-tracking names because they remember ("track") the branch names and hash IDs that your Git saw in their Git, under your remote name origin
.
So, let's say you have this before you run a new git fetch
:
...--G--H <-- master, origin/master
You and they both have only one branch, named master
, and both of these names identify commit H
. Then you run git fetch
. Their master
now points to new commit J
, and they have a branch name develop
that points to commit L
:
I--J <-- origin/master
/
...--G--H <-- master
\
K--L <-- origin/develop
You do not have to do anything, but if you want, you can have your Git move your name master
to point to commit J
. There are many ways to do that, but often, the easiest is to git checkout master
first if necessary.3 This attaches the special name HEAD
to the name master
, so that Git knows which branch name to use for operations that write new hash IDs into the current branch:
I--J <-- origin/master
/
...--G--H <-- master (HEAD)
\
K--L <-- origin/develop
The git checkout
operation also arranges your index (aka staging area) and work-tree (aka working tree) to let you view and/or work with the commit identified by the branch name. That is, H
is now your current commit, and master
is your current branch. We won't go into detail on the index and work-tree here, but they're pretty important: they are where you build up your next commit, and how you work on files, which Git stores inside commits in a special, read-only, Git-only format.
Anyway, now that you're in this particular situation, you can tell Git to do a fast-forward not-really-a-merge "merge" operation to have your master
catch up with their origin/master
:
git merge --ff-only origin/master
This takes your current branch name—master
, from the checkout we just did if necessary—and does the fast-forward operation if it can. If it can't, it doesn't do a merge, it just says that it can't fast-forward, and quits. Since here, it can fast-forward instead of merging, it does that:
I--J <-- master (HEAD), origin/master
/
...--G--H
\
K--L <-- origin/develop
You now have commit J
out, and can see (and work with) its files in your work-tree. Your name master
now identifies the same commit as their name origin/master
and you still have all the commits that they have, and your Git still has, for its remote-tracking names, the names of their branches.
2To add a commit to a branch, you do this with your Git:
- Select that branch name and its last commit, e.g.,
git checkout branch1
.
- Do the stuff needed to tell Git to make a new commit.
- Your Git writes out a new commit, which gets a new unique hash ID. The parent of this new commit is the commit you selected in step 1. Then your Git just writes the hash ID created here, in step 3, into the name you selected in step 1.
Now the branch name identifies the last commit in the branch, as it did before you made the new commit. The new commit is the last commit in the branch!
Pictorially:
...--G--H <-- branch (HEAD)
becomes:
...--G--H <-- branch (HEAD)
\
I
for a moment, but then Git immediately writes I
's hash ID into the name branch
. Git knows that branch
is the right name, because the special name HEAD
is attached to the name branch
. So now we have:
...--G--H
\
I <-- branch (HEAD)
and there's no reason not to just draw them all in a straight line again.
3In Git 2.23 and later, you can use git switch
instead of git checkout
. The reason to do this is that git checkout
, as a command, has too many different jobs it can do. So in Git 2.23, it's been split into two separate commands: git switch
does half of its jobs, and git restore
does the other half. If you have an older Git, or are used to the old way of doing things, the old git checkout
command still works the same as always.
Pruning
Note that if they delete a branch name, your Git still retains your memory of their name. That is, suppose they decide commits K-L
are worthless and just trash their develop
name entirely. You have this in your repository:
...--G--H--I--J <-- master (HEAD), origin/master
\
K--L <-- origin/develop
and you run git fetch
and have your Git call up the Git at origin
. They list the fact that their master
identifies commit J
, and that's it for their branches. Your Git says ah, I have commit J
already and they send you no commits and the two Gits disconnect. Your Git would update your origin/master
, changing it from pointing to J
to point to J
, which doesn't change it, so nothing really happens here. And then your Git is done, and your origin/develop
still remembers commit L
, even though they don't have a develop
any more.
If you don't want this—if you want to get rid of your origin/develop
—you simply tell your Git to prune when it is fetching. Since your Git gets the full list of all of their branches, your Git can see that they don't have a develop
any more. So your Git will now delete your origin/develop
, leaving you with:
...--G--H--I--J <-- master (HEAD), origin/master
\
K--L [abandoned]
To do this pruning, run git fetch --prune
. To make all git fetch
operations prune automatically when they can, configure fetch.prune
to true
:
git config --global fetch.prune true
for instance.
Note that the commits are still there, at least for a little while. With no name to find them, your Git will eventually remove them.4 The removal process for abandoned commits is actually done by a maintenance command, git gc
, that you can run, but it takes a long time: multiple seconds, or even minutes. Git runs it for you automatically, in the background, when that seems to Git to be a likely-profitable venture, so there is hardly ever any reason to run it yourself.
4When you abandon your own commits, your Git tends to remember their hash IDs for at least another 30 days in one or more reflog entries. That keeps the abandoned commits alive for at least 30 days, in case you want them back. In this case, though, there are no reflog entries any more, so this "eventual" is as soon as the next git gc
actually runs. It's hard to predict when that will be, though.
This all leads to the last rule: don't bother with branch names until you want one
Look back at our diagram where they had two branch names, and we had one:
...--G--H--I--J <-- master (HEAD), origin/master
\
K--L <-- origin/develop
We don't need our own branch name develop
here. We only need one if we want to add commits to the end. We can make one:
...--G--H--I--J <-- master, origin/master
\
K--L <-- develop (HEAD), origin/develop
and then make new commits:
...--G--H--I--J <-- master, origin/master
\
K--L <-- origin/develop
\
M--N <-- develop (HEAD)
Now we need to send our new commits to them, for which we use git push
. This works a lot like git fetch
: we offer them commits we have that they don't, by hash ID. But it ends differently. Having sent them our commits M-N
, we then ask them to set their branch name develop
to point to commit N
. If they accept, we update our own origin/develop
:
...--G--H--I--J <-- master, origin/master
\
K--L
\
M--N <-- develop (HEAD), origin/develop
Commit L
no longer has any name pointing to it, so we can straighten out the kink in the drawing. But we can also switch back to our name master
and delete our develop
:
...--G--H--I--J <-- master (HEAD), origin/master
\
K--L--M--N <-- origin/develop
The commits are all still there. We find them using the name origin/develop
. There's no reason to find them using the name develop
any more. So as soon as we're done with it, we just stop using it and delete it. Then if they add more commits and we git fetch
, the only name we have updates automatically:
...--G--H--I--J <-- master (HEAD), origin/master
\
K--L--M--N--O <-- origin/develop
If we find we need to add on more commits, we git checkout develop
again to create our name develop
from our origin/develop
:
...--G--H--I--J <-- master, origin/master
\
K--L--M--N--O <-- develop (HEAD), origin/develop
and we're ready to add new commits, and then git push
, as usual.
We only need our own name if we're going to add new commits. Otherwise, their names—our remote-tracking names—suffice. We just use those and we're done.
We can even look at their commits using Git's detached HEAD mode. Suppose we've pushed O
and deleted our develop
so that we have:
...--G--H--I--J <-- master (HEAD), origin/master
\
K--L--M--N--O <-- origin/develop
Now they add a new commit P
. We use git fetch
to get it:
...--G--H--I--J <-- master (HEAD), origin/master
\
K--L--M--N--O--P <-- origin/develop
We can git checkout origin/develop
now. Since origin/develop
is not a branch name—it's a remote-tracking name—our Git will use its detached HEAD mode. In this mode, the special name HEAD
just holds the raw hash ID of the commit we're browsing:
...--G--H--I--J <-- master, origin/master
\
K--L--M--N--O--P <-- HEAD, origin/develop
If we make a new commit Q
here, the name HEAD
advances to point to our new commit:
...--G--H--I--J <-- master, origin/master
\
K--L--M--N--O--P <-- origin/develop
\
Q <-- HEAD
and now we really should make a branch name to remember the hash ID of Q
, because if we switch away from this commit (back to P
or J
, say), we'll forget the hash ID. Who can remember those things? Well, Git can remember them. We just need to create a name. That's what branch names are for: to remember the last commit. If Q
is to be the last commit, we make a new name for it. We can call it whatever we want:
git checkout -b feature
and now we have:
...--G--H--I--J <-- master, origin/master
\
K--L--M--N--O--P <-- origin/develop
\
Q <-- feature (HEAD)
The git checkout -b
operation creates the name we choose, and attaches HEAD
to the name. The commit we chose was the commit we were using: the one that HEAD
used to point to directly. Now HEAD
is attached to the new name, feature
, and the name—the branch name—points to the commit.
Usually, you create the name pointing to P
first, then commit to make Q
. But if you forget, this is how you recover: git status
says detached HEAD and you say to yourself, oops, I should create a branch name now. You run checkout -b
, or in Git 2.23 and later, git switch -c
, to do that.
Conclusion
Your branch names are there to remember last-commit hash IDs. Create them when you want that. Otherwise, don't bother with names. Use the prune option to snip away dead origin/*
names.
Your Git kind of wants to use at least one name, so you can let it do that: let it use master
, for instance. Then do a fast-forward after git fetch
. If you never actually do your own work in a repository, you'll just stick with master
and let git merge --ff-only origin/master
take you to their update.
Or, you can even use detached-HEAD mode: git checkout origin/master
, then delete the name master
. You don't actually need it. The detached HEAD
name, plus the remote-tracking name, will serve. After git fetch
updates your origin/master
, you can just git checkout origin/master
again, to move the detached HEAD. This might surprise some Git users, so if you do use this approach, and someone else wants to take over this Git repository, you can warn them—but your Git repository is for you, not for them.