Git's fetch
does not get files—well, not directly anyway.
To some extent, Git doesn't care that much about files at all. What Git cares about are commits. Before we visit this idea any further, though, we should probably review some basic Git definitions.
What's in a repository
A Git repository has three main parts: commits, the index, and the work-tree. (Some Git repositories will omit the work-tree, and in newer versions of Git you can have more than one work-tree, where each work-tree has its own index. But in general you start with one of each.)
A commit is a snapshot: a complete set of files. It's not just a file, and it's not the difference in some files either: it's a standalone thing, with all the files you have decided to save in that commit, in the form they had when you saved them. A commit is also permanent and unchanging. Like all Git objects, it has a unique identifier: 3313b78c145ba9212272b5318c111cde12bfef4a
, for instance. Once it is stored, you can never change anything in a commit. If you try, you get a copy of the commit, with the change, and the copy has a new, different ID. You can (sometimes) delete a commit entirely, but you can't change it, only copy it—well, most of it, all but the changed part of course—to a new different-ID commit.
Git really, really cares about commits. It works very hard to make sure you never lose one. It cares much less about the index and work-tree: those are neither permanent, nor unchanging. The advantages of commits are obvious, but their disadvantage is also obvious: they're stored inside Git—in the repository—in a form that nothing else on the computer can deal with.
The work-tree is the opposite of this: it's in a form that everything else on the computer can deal with, and it's quite impermanent and changeable. It's where you do all your work. It has files, rather than mysterious Git objects. This is where you read, write, and edit your files.
Git's index is initially quite mysterious to most people (it was to me), and it has a lot of complicated twists you will eventually encounter. Some software tries to hide the index entirely, but that's not a good idea. In one way the index is actually very simple though: It's where Git has you build the next commit. The index starts out matching the current commit, and then you git add
new versions of existing files, or entirely new files, to the index, to copy the new ones in. Then when you run git commit
, Git makes a new commit out of whatever you have in the index right now. This makes the permanent, unchanging snapshot. The index, which is also called the staging area, is simply where you arrange (or "stage") your files to make them as pretty as possible for the snapshot.
Each commit also records the ID of its immediate predecessor, or parent, commit. This becomes crucial as soon as you start working with history. The history is formed by the commits themselves, through this "my parent is ..." information.
A branch name like master
simply identifies—by its ID—the newest commit on that branch. Git calls this the tip of the branch. This newest commit remembers its parent, and that parent remembers its own parent (the newest commit's grandparent), and so on. Git also has other entities that do the same kind of thing: remember one specific commit's ID. The most important two are tags and remote-tracking branches.
Summary
A repository contains commits, which contain snapshots, and which form the history of all commits ever made. The branch name master
finds the newest commit on master
. And, although commits contain files, they are not themselves files: they contain whole sets of files, all as one collection.
A repository has an index, which is an intermediary between internal Git commit form and work-tree form, and most repositories have a work-tree, which lets you get at the commits' files as files.
What git checkout
does
The git checkout
command mainly copies commits into the index and work-tree, so that you can move around throughout the history of all commits and see the corresponding snapshot in your work-tree. It also adjusts what Git calls HEAD
.
The name HEAD
, in Git, always refers to the current commit by its ID—but it does so in one of two different ways. You can be "on a branch", in which case the name HEAD
simply contains the name of the branch. It's then the branch name that gets Git the ID of the current commit. Or, you can have a "detached HEAD", in which case the name HEAD
records the ID of the current commit.
If you give git checkout
a branch name—such as git checkout master
—it puts you "on the branch": it checks out the tip commit, since that's the ID stored in the branch name, and it puts the branch name in HEAD
. If you give git checkout
a raw commit ID, or a tag name, or a remote-tracking branch name, it finds the corresponding ID, checks out that commit, and puts the ID into HEAD
.
What git fetch
—and git push
—do
All of the above steps work entirely with your own repository. Git doesn't restrict you to just one repository, though. At well-defined times that you choose, you can tell your Git to call up another Git, usually over the Internet, and have a sort of conversation with that other Git.
This is what both git fetch
and git push
do. They call up some other Git, at the other end of some URL. The URL is usually stored under a name, which is called a remote. The most common one—often the only remote in any given repository—is origin
(because git clone
sets that one up for you).
Remember, though, Git mostly cares about commits. So when your Git calls up another Git, the conversation they have is mostly about commits. They do, of course, need a way to find the IDs of those commits, and for that they usually start with some branch names. This is in general how Git starts everything: take a branch name, or maybe just the name HEAD
, and find a commit ID. Use that commit. Then, if it's appropriate, go to that commit's parent and do something with that commit, and so on.
The fetch
process in particular gets a list of all the branches in the other Git. It then obtains all the commits that are in those branches that it does not already have in its own repository. Those commits come with any necessary snapshot-files, almost as a sort of side effect. Last, your Git takes their Git's branch names and renames them, turning those branch names into your own remote-tracking branch names.
If the remote is named origin
, their (origin's) master becomes your origin/master
. You get all their commits, except for the ones you already have. The ones you already have, you already have. Your Git can be sure you have them because you have the IDs. The ID of each commit is unique to that commit, and the commit is permanent and unchanging—so if you have the same ID they do, you both necessarily have the same commit.
Your Git and their Git use git push
very similarly, but in the other direction, and with a twist: your Git gives them your commits—the ones you have that they don't, that is—and then asks them to set their master
, or whatever branch you are pushing, to set as its tip commit, the same commit you have as the tip of your master
. There's no renaming here: you ask them to make their master
exactly the same as your master
.
When you git fetch
, your Git renames their branch-names, so it's safe to just take them whole. No matter what they did to their branches, this cannot affect your own branch names. But when you git push
, you have your Git ask them to set their branch-names, with no renaming at all. If they don't like the requested setting, they can say "no, I won't set that": they can reject your push. That doesn't happen with fetch, and that's where your initial question comes in.
git pull
= git fetch
+ something else
Fetching just gets you their new commits. Because git fetch
never touches your own branches, you often want a second step.
The main problem here is that the correct second step to take depends on what commits you brought in, and what commits you already had. There are two main options: git merge
, and git rebase
. You can program Git to make git pull
do either one. The default is to do git merge
.
Again, the "right" command depends on what you have in your branches, what you got from the remote when you fetched, and how you want to work. Most people, in practice, mostly want git rebase
here, but git pull
defaults to running git merge
. In many cases, both commands wind up doing the same thing, so that it doesn't matter that the default is the wrong command. But I advise newbies to avoid git pull
, because it does default to the command most people mostly don't want, and because when things go wrong—they always do eventually—the way to recover from the problem depends on you knowing that you ran git rebase
or git merge
. If you actually use those directly, you will know.