Note: this is expansion #2 of a three-part answer.
Tricky git fetch
or git push
operations
In order to understand this one, we need to cover a basic fact about Git. You might already know this, but far too many Git introductions and tutorials skip right over it—and it's crucial, at least when we get to git fetch
and git push
. If you have ever wondered why the h— did Git do that with fetch/push, you are probably missing this information.
What to know about commits
A Git commit stores two things:
- It has a full snapshot of all files, in a special read-only, Git-only, compressed and de-duplicated format, as I mentioned earlier.
- It also has some metadata. This too is read-only, but is easy for humans to see, and not too tricky: try
git cat-file -p HEAD
to see an example. The metadata include things like your name, and some date-and-time stamps. (These help make sure each commit's content is unique, which is needed to make its hash ID unique: see below.)
Each commit is numbered, with what looks like a random hexadecimal string. This number is actually a cryptographic checksum of the contents of the commit. Git guarantees1 that each number is totally unique, so that this number is the commit, and vice versa, in an important sense.
If you use the git cat-file -p HEAD
trick, you'll see that each commit has some parent
lines. These parent lines give the raw hash ID of the earlier commit or commits: the commit(s) that come just before this commit.
What this means is that Git's commits are all strung together, like pearls perhaps. These "strings" are backwards-looking. They have to be, because all parts of any commit are read-only.2 When we create a new commit, we know what the hash ID of its parent is, because the parent exists now. We don't know what the hash ID of its future children will be, because those depend on what will be in the commit, and the exact date-and-time when we make the commit.
So, let's draw this. Let's assume there's just one branch (we'll draw it in later) and there are three commits so far. They have unique, big, ugly, random-looking hash IDs, which we don't know, can't pronounce, and don't want to memorize. Instead of bothering with their real hash IDs, let's just call them commits A
, B
, and C
, which were made in that order.
A <-B <-C
Commit A
is slightly special: there's no earlier commit, so it has no parent. This makes it what Git calls a root commit. It still has a snapshot of all of its files, though, and the name and email address of whoever made it, and so on.
Commit B
lists commit A
's hash ID as its parent. Commit B
also has a snapshot of all files, name, email address, date-and-time stamps, and so on, but because B
lists A
's hash ID, we say that commit B
points to commit A
. That's the little arrow coming out of B
, going back to A
.
Commit C
is similar, but points to earlier commit B
. C
need not point to A
: Git only need to use C
to find B
, and then can use B
to find A
. So all Git needs, to find every commit in this little three-commit repository, is the hash ID of the latest commit C
.
Since no commit can ever change, the arrows coming out of any commit always, necessarily, point backwards to earlier commits. We'll use this to drop bothering drawing in the arrows, and just draw connecting lines:
A--B--C
We still need to know the hash ID of the last commit in the chain, though. That's C
here. Where will Git store this random-looking hash ID, that we're calling C
?
1The pigeonhole principle tells us that this numbering scheme is ultimately doomed to fail. The size of the hash determines how long we can play the game before this ultimate failure: if it's big enough, we can play the game for longer than the universe will exist, and that's good enough!
2That, in turn, has to be, because the hash ID is made from the contents of the commit. Change anything about the commit, and the hash ID changes: what you have is not a modified commit, but a new and different commit. The old commit still exists, with its old hash ID.
Branch names
A branch name, in Git, simply holds the hash ID of the last commit that we want to say is part of the branch. We can draw that like this:
A--B--C <-- main
Since C
is the last commit on main
, git checkout main
means get me commit C
.
Now let's make a new commit, in the usual way, by checking out main
and doing stuff and git add
and git commit
. The git commit
command packages up a new snapshot—we'll skip over where it actually gets this snapshot, but that's a bit tricky—and adds metadata: our name and email address, the current date-and-time, and so on. This all goes into a new commit that gets a new, random-looking, unique hash ID that we'll just call D
. The parent of new commit D
will be the current commit C
, so that D
will point backwards to C
:
A--B--C <-- main
\
D
and now the real magic trick happens: having written out commit D
successfully, Git now writes D
's hash ID into the name main
. The result is:
A--B--C--D <-- main
The name main
now selects commit D
, the latest commit on main
. Commit C
still exists—it will probably exist forevermore—but it's no longer the latest commit, because the name main
now selects D
, which is the latest commit.
If you decide, right after making new commit D
, that commit D
should have been on a new feature branch, you can fix this mistake easily, because nobody else has commit D
yet (you just made it). So you would run:
git branch new-branch
which produces:
A--B--C--D <-- main, new-branch
You would then need to make the name main
select commit C
again. We'll come back to this in a moment.
HEAD
Now that we have two branch names, we have a problem: which name are we using? Git solves this problem with one very special name, HEAD
or @
(you can use either one, although some ancient versions of Git don't accept @
everywhere). Note that HEAD
must be spelled in all uppercase to work correctly;3 use @
if that's too painful.
What Git does with HEAD
is to attach this name to one branch name.4 The branch name to which HEAD
is attached is, by definition, the current branch. The commit to which that name points is, by definition, the current commit.
What this means is that if we start with:
A--B--C <-- main (HEAD)
and then add a new branch:
A--B--C <-- main (HEAD), new-branch
and then check out this new branch, with git checkout
or git switch
, Git will attach HEAD
to the new name:
A--B--C <-- main, new-branch (HEAD)
but change nothing else. We're still using commit C
. We're just using it through a different name.
As soon as we make a new commit D
, though, things change: Git writes the new commit's hash ID into the current branch name. HEAD
remains attached to new-branch
, but new-branch
itself now selects commit D
:
A--B--C <-- main
\
D <-- new-branch (HEAD)
3In particular, the name HEAD
has to be per-worktree. Each added working tree gets its own HEAD (and index / staging-area). When head
, in lowercase, works for you, it does so due to a quirk of your particular Git and file system. Git does not—probably should, but doesn't—notice that head
accesses a file named HEAD
. Using HEAD
, in all caps like this, makes Git use the correct file for your added working tree. Using head
in lowercase makes Git use the HEAD
file for the main working tree. The result is that you can get the wrong commit! So don't spell head
in lowercase: it will get you in trouble someday.
4Technically, the per-worktree HEAD
file contains the string ref: refs/heads/branch-name
. Git also has a detached HEAD mode where the file contains a raw commit hash ID. Git uses detached mode internally during git rebase
, and it has several other uses, such as inspecting historical commits, but detached-HEAD mode is not a typical way to get work done.
Putting these together
This is how branches really work in Git. A branch name selects the last commit, by definition. That commit points backwards to its parent. The parent commit points backwards to another still-earlier commit. That commit points backwards too, and so on, and on, all the way back to the very first commit. The history is the commits, and the linkage is in the commits. The branches are, in some sense, just the set of commits selected by picking the last ones and working backwards. The names select the last commits, and in the diagram below, all four commits are on new-branch
, while the first three commits remain on main
.
A--B--C <-- main
\
D <-- new-branch
Checking out main
means select commit C
for my working tree; checking out new-branch
means select commit D
for my working tree. Selecting the commit attaches HEAD
to the name, so that new commits will grow that branch.
Branch names move
As you can see now, branch names regularly move forward, one commit at a time, as you make new commits. Branch names also sometimes move forward multiple commits. Suppose, for instance, that we have this:
A--B--C <-- main
\
D--E--F--G <-- new-branch (HEAD)
and we now deem our new feature branch "ready". We might run:
git checkout main
git merge --ff-only new-branch # the `--ff-only` is optional
At this point, Git notices that main
could catch up to new-branch
without having to do any real merging at all, just by "sliding the name forward". That is, main
can move forward four times, from C
to D
to E
to F
to G
. Git calls this sliding-forward of a branch name a fast-forward operation. The result is:
A---B--C--D--E--F--G <-- main (HEAD), new-branch
(remember that git checkout
moved HEAD
to main
).
When you do this with the current branch name, Git calls this a fast-forward merge. Git has to replace the C
-commit files with the G
-commit files, so this is a lot like running git checkout new-branch
in some ways. But instead of switching to the other branch, Git just drags the name main
forward.
There is a problem here sometimes. Suppose that, after we made new-branch
and some commits on it, we switched back to main
and made a new commit on main
too:
A--B--C---------H <-- main (HEAD)
\
D--E--F--G <-- new-branch
If we now try to merge new-branch
, Git cannot "slide the name forward". Git would have to back up first, dropping commit H
entirely; the result would be:
H ???
/
A--B--C
\
D--E--F--G <-- main (HEAD), new-branch
with no way to find commit H
. Commit H
still exists, it's just lost. Remember that real commits have random-looking, un-memorable hash IDs: would you remember the hash ID? Would you be able to pick it out of a police lineup?
Git won't do this. If you run git merge new-branch
, Git will, instead, make a true merge, using a merge commit, which I'll draw like this but won't go into any details:
A--B--C---------H--M <-- main (HEAD)
\ /
D--E--F--G <-- new-branch
Using the --ff-only
flag to git merge
tells Git: If you can't use a fast-forward, give me an error instead of attempting a merge commit. There are more options, but since this isn't about merging, we'll stop here.
Forcing the current branch name to move with git reset
The git reset
command is large and full of many options.5 In general, however, it does three things—or rather, up to three things, optionally stopping after one or two of them:
First, git reset
moves the current branch name.
This step almost always happens (there are some forms of the complicated reset
command that won't let you move the branch name), but you can pick the current commit as the place to move to. If you do that, the "move" is basically just to stand in place after all. You use this kind of stand-in-place "move" to achieve one or both of the remaining two steps.
With --soft
, Git stops after this step. By default, it goes on.
Second, git reset
resets Git's index (aka staging-area). Since this isn't about the index / staging-area, we won't cover what this means.
With --mixed
or the default, Git stops after this step. We'll illustrate --hard
here though, so we will go on to the last step.
Last—with --hard
—git reset
resets your working tree, pretty similarly to git checkout
or git switch
, but without any warning if this destroys unsaved work.
This means that, e.g., git reset --hard
, which uses the option we're interested in, can be used to wipe out any changes you have decided are a bad idea. That is, you might git checkout
some branch name, make a stab at fixing a bug, and discover that it isn't a bug at all, or you changed the wrong code. You then run git reset --hard
. What this does is:
- move the current branch name to the current commit: it stays in place;
- reset the index / staging-area: nothing is staged for commit now; and
- reset the working tree: nothing is modified now, the current commit is restored to your working tree.
If we pick some other commit hash ID to re-set to, though, we can drag the current branch name to any other commit. Why might we do this? Well, let's go back to our setup that looks like this:
A--B--C--D <-- main (HEAD), new-branch
We got this when we accidentally made new commit D
on main
, then added a new branch name without checking it out. We now want to force main
to point to commit C
, and get commit C
checked out. The git reset --hard
command achieves this:
git reset --hard <hash-of-C>
(we can get the hash with git log
, for instance; there are other, smarter ways but this works) and now we have:
A--B--C <-- main (HEAD)
\
D <-- new-branch
The git reset
command moved the branch name to which our HEAD
is attached, so that it now points to commit C
; with --hard
, it sets things up so that commit C
is the one checked out, too. Since git reset --hard
wipes out unsaved work without asking, we'd better be really sure we committed everything first, of course, but now we're good: our new commit is now only on our new branch, with the same old three commits on main
that were there before.
5The git reset
command has too many options, in my opinion: it's like git checkout
, and needs a lower-powered, higher-safety version the way Git 2.23 added git switch
. Just be careful when using it.
Using fetch
and push
Now that you know how branch names work within one Git repository, it's time to consider how they work when using git fetch
and git push
. The key thing to know here is that repositories share commits by hash ID, but each repository has its own branch names.
Remember that a repository is essentially two databases:
One (usually the biggest by far) contains the commits, and the files in the special Git-ized format, and so on. Git keeps these in a simple key-value store, indexed by hash ID.
The other database holds names: branch names, tag names, and various other names. All the names simply hold one hash ID. For a branch name, this hash ID is, by definition, the last commit in the branch. (For a tag name, the hash ID is often that of an auxiliary tag object. The rules, and uses, for each kind of name vary a bit.)
Since your repository is a repository, your repository has branch names. Since some other Git repository is a repository, that other repository also has branch names. The hash IDs stored in their branch names don't necessarily match the ones stored in yours, though. To make all this work well, Git now has the concept of a remote-tracking name.6
When you set up your Git repository to talk, regularly, with some other Git repository, you give that other Git repository a name. The traditional name for the (singular) other Git repository is origin
. This name, origin
, stores the URL; your Git then uses git fetch origin
to call up that Git and get stuff from them, and git push origin
to call up that Git and give stuff to them.
Having given their Git a name, your Git will get commits from them by a pretty simple process:
- Your Git calls up their Git.
- They list out all their branch names, and the corresponding commit hash IDs.
- Your Git looks up these hash IDs to see if you already have the commits. If so, your Git tell them already have that one. If not, your Git tells them want that one. If your Git wants some particular commit, their Git is now obligated to offer that commit's parent commit too; your Git checks this hash ID and says "want" or "already have" as appropriate, and this repeats until you will get all the commits they have that you don't.
- Their Git now packages up all the commits and other supporting objects your Git needs, and sends them over. You now have all of your commits and all of theirs, with no wasted effort: you don't bother bringing over any commits you already have, and the two Gits are smart enough to figure out which files are pre-de-duplicated and so on, too.
So now you have all of their commits, as found on their branches. Your Git now takes each of their branch names and changes it: your Git sticks origin/
in front of the name.7 So their main
becomes your origin/main
; their feat
becomes your origin/feat
; and so on.
Your Git then creates or updates each of these remote-tracking names in your repository. You now have origin/main
, which selects the last commit that's in their branch main
. You might have origin/feat
, if they have a feat
. In each case, your remote-tracking name tells you which commit is the last commit in their branch.
The git push
command is similar, but there are two big differences:
- First, you'll be sending commits to them rather than getting commits from them.
- Second, after you've sent them commits, you'll have your Git ask their Git to set one (or more) of their branch names.
This set a branch name operation is in some ways like git reset
. Remember how we have the ability to make the current branch name, in our Git repository, point to any commit we choose. A git push
we run sends to their Git a request of the form: Please, if it's OK, set your branch name _____ to point to commit _____. Our Git fills in both blanks, usually from one of our branch names.
The nice thing about this request is that it's polite: it's not a command, like git reset
. And—here's the tricky bit—they won't obey unless that operation is a fast-forward. Remember how we talked about git merge --ff-only
above, and when it works. A branch-name-move operation is a fast-forward if it adds new commits without forgetting any old ones. If we send them a polite request, asking them to fast-forward their main
for instance, and our commits don't just add on to their main
, they will reject our request:
! [rejected] ... (non-fast-forward)
This usually means we need to re-do our own commits somehow—make new and better ones—that do provide a fast-forward operation. (See also What does "Git push non-fast-forward updates were rejected" mean?) But we can make use of that in a different way.
6Git calls this a remote-tracking branch name; I find the word branch in here redundantly duplicative, a distractive pleonasm used by the loquacious.
7Technically, your remote-tracking names are in an entirely different namespace, under refs/remotes/origin/
; your branch names are under refs/heads/
.
Forced fetch or push
For completeness, let's cover --force
with fetch and push.
Git "likes" fast-forward operations, because they literally can't remove a commit. Any commits that were on a branch before the operation are still on the branch after the operation. But sometimes you really want Git to "lose" a commit entirely. The --force
flag exists for this purpose.
Normally, you just run git fetch
or git fetch origin
. This has your Git reach out to origin's Git and get branches, and—as noted above—creates or updates remote-tracking names, not branch names. Your branch names aren't touched; only your Git's copies, in remote-tracking names, of their Git's branch names get updated here. If their Git has, for some reason—such as a git reset
—moved a branch name backwards, your Git should move your remote-tracking name backwards too. So Git updates these remote-tracking names with --force
implied, if needed.
If you're doing a git push
and the other Git rejects your push because it's a non-fast-forward, you can sit down and figure out whether this is OK after all. If it is OK, you can use a forced push, git push --force
, to send it anyway. (Ideally, you should use a fancier kind of force, "force with lease" or similar, but we won't cover this properly here.)
Note that these all involve "losing" a commit, like we did when we moved main
backwards with git reset
, so that our new commit was only on our new branch. If we're careful, we can make sure that any "lost" commits that we want retained, are still find-able by some other branch name. We'll only truly lose some commit(s) that we have discarded on purpose, perhaps by making new-and-improved commits to use instead.
Refspecs
In our examples above, we just used simple branch names:
git push origin somebranch
for instance. But in fact, git push
and git fetch
both take refspecs after the remote name. A refspec consists of two parts separated by a colon :
, and optionally prefixed by a plus sign +
. So we could write:
git push origin somebranch:somebranch
or even:
git push origin HEAD:somebranch
The optional plus sign, if we use it, means --force
, so we should very rarely use it. Here we won't use it at all.
The colon, if we use it, separates the source part, on the left, from the destination part, on the right:
- For
git fetch
, the source is the branch name in the other Git repository. We're going to get this commit; they will have to send it; so that's the source.
- For
git push
, the source is the branch name or commit hash ID in our Git repository. We're going to send this commit, so that's the source.
The destination, if we list one separately, is the name that should get updated. For git fetch
, we might list one of our origin/
names, like origin/main
. We never have to do this in modern Git, though:8 Git will update our remote-tracking name appropriately. We can just git fetch origin main
and our Git will update our origin/main
for us.
For git push
, where we are going to ask their Git to set one of their branch names, we can list their branch name. This allows us to use a raw commit hash ID, for instance, as the source:
git push origin a123456:theirbranch
This is how we can push a commit that's not at the tip of the branch locally. For instance, if we're on our new feature branch and we're sure of everything up to and including a123456
, but are still working on stuff after that point, we can use this to push only the stuff we're sure about.9
8"Modern" here means Git 1.8.2 or newer, and there is a caveat: this has to be listed in the default fetch refspecs. For a single-branch clone, if we're deliberately fetching a branch not listed, we might need to do something different.
9It's often fine to just push everything. If we push a bad commit, we can retract it. This, however, assumes that our colleagues won't take our bad commit and use it for something. So make sure your colleagues won't do anything boneheaded, first.
The remote named dot (.
)
Above, our git fetch
or git push
used the remote named origin
. That's the other Git we're having our Git connect to. But all Git repositories can talk to a "remote"—it's a sort of pseudo-remote—named .
, a bare period by itself.
This "remote" means call up ourselves. That is, we treat our Git repository as if it were another Git repository. We spin up one Git to talk to another Git, and pretend the other Git is on another machine, even though it's right here on our own computer. For sending commits around, this never makes any sense, because any commits we have, the other Git—which is our Git—will have, and for any commits we're missing, the other Git will be missing those same commits. But for branch names, well, now the dot has a purpose.
If we git fetch .
, we will see our own branch names as some other Git's branch names. We can combine with with the refspec trick. Moreover, a non-forced fetch or push always follows the fast-forward rule. We can use that for our special purpose operations.
Assembling all of the above
Now that we know all of the above, we can understand what:
git push . origin/main:main
does, and what:
git fetch origin main:main
does. Let's consider that git push
first:
- We have our Git call up some other Git, with the "other Git" really being our own Git.
- Then, we ask our Git to send to the other Git, any
origin/main
commits they don't have. Of course they have all the same commits, so that goes very fast and sends nothing.
- Finally, we politely ask them to fast-forward their
main
to match our origin/main
.
If fast-forwarding their main
is possible—this requires that they don't lose any commits, and also that they don't have main
checked out—they will do that. But "they" are really us: we just need to have some other branch checked out, and then we'll have our own Git fast-forward our own main
to match our own origin/main
. If it can be fast-forwarded, it is; if not, it's not, with a ! [rejected]
message.
This does of course require that we run git fetch
or git fetch origin
first, so that we get any new commits from origin
and update our origin/main
. Once we've done that, we can git push .
to attempt the fast-forward.
To do this all in one command, we use the:
git fetch origin main:main
form. This has our Git call up origin
's Git and get any new commits from them. If our Git isn't too ancient, our Git automatically updates our origin/main
right away, even if this requires a force-update. But having done that, our Git then tries to do a non-forced update of our own main
, based on the new commit hash we just stuck in our own origin/main
.
There's a minor negative side effect here: git fetch origin main
restricts our Git. When we call up their Git, and they list out all their branches, our Git just picks out any updates they have to their main
, to bring over. So we still probably want a separate, unrestricted git fetch origin
command. That will get all their new commits and update all our remote-tracking names.
Either way, it's worth knowing that git fetch
and git push
use refspecs, that .
means our own repository, and that fetch and push will do fast-forward non-forced updates, but won't force a non-fast-forward update to their or our branches without the force flag (--force
or +
).