For what it's worth, here's a more detailed, technical-ish (but still high level-ish) description of push
vs fetch
/ pull
, incorporating the case of a very slow (multi-hour) fetch
.
Before you read the rest of this answer, you may want to run git ls-remote origin
. This is a read-only operation so it's safe to do at any time. On a busy server, the output here can be quite large, so you might want to view it in an editor, but you'll see a long list of SHA-1 and name pairs. Each SHA-1 identifies a specific git object in their (the remote's) repository, and each name is a name for that object.
The nearest opposite of push
is fetch
, not pull
The git pull
command is simply1 a convenience command that runs git fetch
followed by git merge
(or, if directed, git rebase
). So you really should compare push
to fetch
.
Remember that fetch
and push
involve two gits
In general,2 you will fetch or push from/to a "remote", on another host entirely. The other host runs another instance of git, which looks in another repository. Generally the two repositories are related somehow, e.g., both were originally cloned from a third location or one was cloned from the other. The details of the relationship matter less than that there is some relationship, so that the two repositories have something in common. (This is not actually required, it's just the typical case.)
The call goes over the Internet-phone, or some similar communications channel. Thus, we can speak of "your git" saying something to "their git", and "their git" saying something back to "your git". The two git instances converse with each other in order to figure out what to do.
Refspecs are key
Both fetch and push use a key concept called a "refspec". At its core, a refspec is essentially a pair of reference names, such as a pair of branch names or a pair of tag names. The pair is separated by a colon, e.g., master:master
. A "fully qualified" name, like refs/heads/master
or refs/tags/v1.2
, avoids ambiguity here, but short names usually work well (and are what most people use when explicitly specifying something) as git will automatically figure out whether the name is a branch or a tag.3
You can use more than one refspec at a time, and refspecs allow for multiple matches with a *
syntax as shown here.
The default refspec for a fetch
from a remote named origin
is:
+refs/heads/*:refs/remotes/origin/*
We'll ignore the plus sign here.4 This particular refspec says: "take all branches" (in the refs/heads/
name-space) "and copy them from their repo to my repo, but in my repo, change the name to refs/remotes/origin/
".
This is how "remote branches" AKA "remote tracking branches" work, and how they arise in the first place: your git copies their git's branches, but places them in a different name-space, so that (your copies of) their branches will not affect your branches.
(When you use git pull
some of this is hidden from you. This "hiding" is rather imperfect though, so you should know about it. In modern gits—1.8.4 and later—remote branches are updated on all fetch operations, including those run from git pull
. Earlier versions of git don't update the remote branches on these kinds of fetch operations, but do still update them "regular" fetches and on all push operations, which is a bit weird, and is why newer gits just always update.)
Refspecs for push
are complicated
The usual default refspec for fetch
is easy to describe—"get me all their branches, but make them remote-tracking branches in my repo". This is clean, simple, and effective, and it has been the default ever since "remotes" were invented.
The default for push is complicated and configurable, and the default has changed in git 2.0.
The old default push is called matching
and the new default is called simple
(although it's still a bit complicated). The new simple
rule is:
- find the "upstream" branch, and
- push to the same branch on the remote, but only if it has the same name.
That is, suppose you're on branch master
, and the "upstream" for that is "branch master
on remote origin
". Then git push
means git push refs/heads/master:refs/heads/master
.
For the old default (matching
), your git and their git talk a bit to see which branches you have, and which branches they have. Then all the branches that you both have, that have the same name, are put into your refspec. So if you have a branch named betty
and they have a branch named betty
, that adds refs/heads/betty:refs/heads/betty
to your push refspecs. Then if you both have branches named fred
, those are added, and so on. Your git then attempts to push all the matching branches to their git.
It's worth noting two more things here:
On fetch, "their" ref-names go on the left of the refspec, and yours go on the right. Their master
becomes your origin/master
because the refspec has refs/heads/master
on the left, and refs/remotes/origin/master
on the right. But on push these are reversed: your branch or tag name goes on the left, and theirs goes on the right.
When pushing, you can omit "your" side of the refspec to tell the remote to delete a reference. To ask their git to delete branch develop
, you can use git push :refs/heads/develop
.
Crucial: fetch
and push
are not symmetric
Besides the somewhat obvious syntactic differences (fetch
doesn't have a way to delete, and the left vs right side bits), there's one more thing that's absolutely crucial here. When you use git fetch
, you copy their branches to your "remote branches", but when you push, you ask to send your branches to their branches, not to any sort of "remote branch".
What this means is that if you don't already have all the commits they have on your local branch, and you ask them to take that branch via git push
, you will ask them to lose some commit(s)—specifically, whichever commits they have that you don't.
Normally a remote will refuse ("reject") a push that would lose commits ("non-fast-forward"). You can override this with --force
but that's usually the wrong thing to do.
Unlike push
, git fetch
puts "their" branches in your remote-tracking branches. Thus, fetch just takes what they have every time. This can't disrupt your branches since it does not even touch your branches.5
This is why you merge or rebase
Suppose you've done some work, and someone else has also done some work. Suppose also that both of you use git push
to update a third repository (perhaps on github, for instance), and you're both working on branch develop
. Consider it as a sort of race: you've both made some changes and committed them locally, and now you're in a race to see who can push first.
Let's say the other guy wins the race. He does his git push
, which calls up his remote—this is also your remote—and he asks them (the remote) to take his develop
branch and make it their develop
branch. Since what he's done is simply adding to what they had before, they accept his push.
Now you come along, having lost the race, and ask your remote to take your develop
branch and make it their develop
branch. They check as usual, but this time, they find a commit, or several commits, that they have, that you don't. These are precisely the commits that the other guy pushed, when he won the race.
The remote will reject your push as a non-fast-forward.
You can now use git fetch
to obtain those commits from the remote. They will go into your origin/develop
remote-tracking branch.
You now have all of their commits, plus your own commits. If we draw a part of your commit graph, it may look something like this:
o <-- HEAD=develop
/
... - o - o
\
* <-- origin/develop
Here, the commit marked *
is the one the other guy pushed when he won the race. It is now your job to coordinate your changes and his changes.
There are plenty of articles (here on stackoverflow, and elsewhere) about how to deal with this (merge or rebase, when to use which one, and so on).
Whether you prefer merging or rebasing, you can do it manually:
git fetch
git rebase
or you can use the git pull
script, perhaps with --rebase
or a configuration item, to combine the two steps. (I think it's better to start by doing them separately, as you'll get a better mental model of the work-flow, and eventually you'll know intuitively when it's safe to use git pull
as a convenient method of doing both operations all at once. Also, when you do them separately, you can easily look and see what's happened since you last re-synchronized, as git log ..@{u}
6 will show you what's in the upstream that is not in your branch.)
What about a very slow fetch
? For that matter, what about a very slow push
?
The key item here is that when doing either fetch
or push
, git first figures out what repository objects to transfer, then it starts doing the transfers. Then, for a push
, git atomically does its checking, and permits or rejects the label-update part of the operation.
Let's consider fetch
first, and assume that git is able to use a "thin pack". Your git calls up their git, the two gits figure out that you need many megabytes or even gigabytes of commits, trees, blobs, and/or tags—these are the four object types in the repository—and their git packages them all up as a "thin pack".
At this point, your git begins the slow process of transferring all this data over the Internet-phone. If someone else comes along and does a (successful) push
, that push—along with its objects—goes into the remote repository, but your git and their git have already decided what's coming over, and those new objects and/or label-settings are not included.
When your fetch finishes, your git expands the thin pack (this is where you see the "resolving deltas" message) and updates your labels, based on what you brought over. It's as if the push has not yet happened: you get an atomic snapshot of what they had when you started your fetch.
(This is why you might want to run a second fetch
immediately after a long, multi-hour fetch on a busy repository: you can pick up any changes that occurred during that period. With any luck, this time you will just have a few small items to bring over, which will take only a few thousand milliseconds, probably not enough time for even more changes to sneak in.)
If you have a very slow push
, the situation is similar: you send the remote a "thin pack" that's not very thin, and once he gets the whole thing (and resolves deltas), then he checks to see if it's a fast-forward, or is otherwise permitted. This checking and (if permitted) updating is all done atomically (the remote uses a lock file to achieve this atomicity—and in fact, a fetch
you run uses a lock file on your side, for the same reason). For each label (branch or tag) update, the push either succeeds or fails.7
(If you're using a "dumb" protocol, the details change, but the overall strategy remains the same. Updates are, or should generally appear, atomic.)
1Or not that simply, since it has a lot of special corner cases it tries to handle, plus the logic to do rebase
instead of merge
.
2You can do git operations on your local machine that don't involve a second git instance. However, the principle is the same; it's just that now your local git plays both the "local" and "remote" roles, speaking to itself.
3Put very simply, it's a branch name if it's in refs/heads/
, and it's a tag if it's in refs/tags/
. Git gets a chance to see which one it's in (on both sides) during the "phone call" between the two git instances. If the name could be ambiguous—if there are both branches and tags named bob
, for instance—you can spell it out explicitly which one(s) you want.
4The plus sign simply sets the "force" flag for that particular refspec. This is the same force flag you can set with --force
, except that it's per-refspec rather than global.
5This glosses over the fact that git's tags use a single global name-space. That is, when you git fetch
from a remote, even without adding --tags
, git may update your local refs/tags/
entries. In particular, unless you specify --no-tags
, your git will see if any of the new object SHA-1 IDs you bring over correspond to any tags on the remote (see the output of git ls-remote
: all the tag SHA-1 IDs are available at all times). If so your git will create a corresponding tag. Since there is no "remote tags" name-space (unless you reinvent it yet again), it's not completely safe to git fetch
, as this may add a "surprise" tag (one you did not expect). In practice, however, since tags never8 move, this is not a problem.
6The @{u}
syntax means @{upstream}
, which means "find the upstream branch I'm tracking", which in this example would be origin/develop
. Once you've done the git fetch
, origin/develop
points to the latest commit present on the remote—since you just picked it up by fetching—and the ..
syntax means "find commits reachable from the right-hand-side specifier that are not reachable from the left-hand-side specifier". The empty left-hand-side means HEAD
which means the tip of your current branch develop
, so this asks git to log commits that are on origin/develop
that are not on develop
.
7For regular pushes, this is pretty straightforward: you're expecting to push a fast-forward, where the remote will, for instance, have branch refs/heads/B
pointing to commit 1234567...
. You have commit fedcba9...
whose ancestor is 1234567...
and you ask to push this commit to their refs/heads/B
. Once they have the pack, they check to see if their current refs/heads/B
is an ancestor of what you're asking to set it to. Either it is—you've asked for a fast-forward operation on the label—or it isn't and the push is rejected.
For force-pushes, though, or when deleting a branch, you might want to make sure that the remote's refs/heads/B
points to some specific commit, i.e., that no one else has won a "push race" against your force-push or delete operation. This was, at one point, not possible in git, but since 1.8.5, git has acquired the --force-with-lease
option for push. Here, you specify the SHA-1 you believe the remote will have its label pointing-to, by the time your push has gotten all the way across and is being executed atomically. If you are correct, the update is allowed. If it turns out that the label has some other value, your force-update is rejected instead.
This is not something most people normally need, but it does allow for atomic updates that are not fast-forwards.
8What, never? Well, hardly ever!