TL;DR
You are getting the equivalent of:
git fetch origin A && git merge origin/A
which merges your current branch B
with whatever the git fetch
did to your origin/A
, which depends on what happened in their A
.
Long
This gets a little bit complicated. Here's the best way to think about it and remember how it all works.
The basic command has this form:
git pull [<options>] [<remote> [<refspec> ...]]
where the square brackets indicate that something is optional, e.g.,:
git pull --ff-only origin branch-A
specifies all three but:
git push origin branch-A
leaves out the <options>
part. The angle brackets, meanwhile, mean fill this in with something, which is why we replaced options
with --ff-only
(a valid option) and why we put in origin
—a valid remote—and branch-A
—as valid refspec—as our remote and refspec options. Finally, the three dots ...
mean you can keep adding more of these, in this case, more refspecs.
That still leaves you with some extremely important questions. We'll use simple answers here to try to keep things simple; there are more details that I will leave out:
- What options can we use? What do they all mean?
- What's a remote and what's a refspec, why don't we have to provide them, and what do they do when we do provide them?
- What does all this do anyway?
There are many options, but there are two or three you should know: --rebase
and its counterpart, --no-rebase
; and --ff-only
. We'll come back to these. Let's move on to remote and refspec, and what this all does.
What is a remote?
In brief, a remote is a short name you will use for some other Git repository. Your own Git repository is just that: a Git repository.
A Git repository is, at its heart, really just two databases:
One database holds commits—which save every file for all time, more or less—and supporting things called "objects". These are all numbered, with big ugly hairy scary-looking hash IDs or object IDs ("OIDs") like cefe983a320c03d7843ac78e73bd513a27806845
. These things are pretty toxic, or at least indigestible, to human beings, so we try to avoid using them. But they're how Git actually finds the commits.
The other database holds names. These are things like branch names, tag names, and many other special-purpose names. Each name translates to exactly one of those big ugly hash IDs. They help you—and Git—find the commits that you care about.
Because Git acts as a big distributed database, we connect our own Git software, working with one Git repository of our own, to someone else's Git software (such as on GitHub for instance), working with one other repository. That other repository might be nominally "ours" ("our fork on GitHub"), or someone else's ("their fork" or whatever). Either way though, we have our-software-with-our-repo ("our Git") talking to their-software-with-their-repo ("their Git").
To connect our Git to their Git, we have to give our Git a URL, like https://github.com/user/repo.git
or ssh://git@github.com/user/repo.git
or something. Our Git (our software running on our repo) uses the URL to reach their software, and point it to their repo, so this URL locates "their Git". These URLs are long and typo-prone and generally not great for humans, so we have a convenient shorter name. That's a remote.
This remote—this short name for some other Git—lets us hook our Git up to their Git easily. We just say git fetch origin
and our Git looks up origin
and gets the URL. We say git push origin
and our Git looks up origin
and gets the URL. Or we say git pull origin
, and our Git looks up origin
and, well, you get the idea.
The remote, then, is the short name for the other Git. The name origin
is the standard short name for the other Git you used when you ran git clone url
to create your repository the first time. At that time, your Git software created a new, totally-empty repository, then set up the standard first name origin
to store the URL, in this new empty repository.
Then your Git ran git fetch origin
. This got, from the origin
Git, all of their commits and other objects, and all of their names—the contents of the two databases. But here's tricky part number 1: Your Git doesn't keep all of their branch names. Instead, your Git renames all of their branch names, sticking origin/
in front of them.1
So in your two Git databases—commits and supporting objects, and names——you get all of their commits and none of their branches. Instead, what they had as branches, become your remote-tracking names. A remote-tracking name is like a branch name, but with the name of the remote stuck in front, plus the slash. That's where origin/main
or origin/master
comes from, for instance.
As a last step, git clone
then creates one branch name in your own repository. Typically, we run git clone
without the -b
option. If you use the -b
option, you're telling your Git which of their branch names to use here. But usually we don't, so your Git asks their Git which branch name they recommend. They'll generally recommend their master
or main
, and that's the branch name your Git will create.
Since the point of branch names is to help us find commits, which have those big ugly hash IDs or OIDs, your Git makes your one new branch using the same OID that their Git used for their recommended branch. So, assuming they recommend main
, your main
will now match your origin/main
, which will match their main
.
This is all pretty confusing at first. It takes a while to get a good handle on it. The key here is that your branches are yours. Meanwhile their branches are theirs. Your Git will take their branch names and rename them to be your remote-tracking names. And it's not actually the names that matter, but rather the hash IDs (or OIDs) of the commits. You'll use your names—your branch names and your remote-tracking names—to find the right hash IDs: your most recent commit, or their most recent commit, for instance. They'll use their branch names to find the right commits. Every once in a while, you have to synchronize these things. How often, when, and how ... well, we'll get into this in just a moment.
1Technically, this isn't quite right, but it's good enough for here.
git fetch
You asked about git pull
, but we should start instead with git fetch
. The fetch command is the more basic one, and is what Git tutorials should start with. None of what git pull
does makes any sense until you first understand git fetch
.
The git fetch
command takes a remote: the name of some other Git to call up. Its purpose is to get updates for your databases, and to get those updates, it has to call up some other Git software talking to some other Git repository, using a Git URL.
If you don't give git fetch
a remote, it will find one on its own, or—as a last resort—default to origin
.2 By a bit of magic,3 Git will usually get this right, so you don't have to name origin
anyway, but you might want to, as we'll see.
Your Git then uses the stored URL to call up some other Git. That other Git, as before, has its own two databases, with commits-and-supporting-objects and with names. Your Git gets, from their Git, all the names and the latest commit hash IDs. Your Git then checks to see if you have these commits. If you don't have them, your Git asks their Git to send over those commits. The two Gits have a conversation here where your Git figures out what commits they have, that you don't, that you'll need. They then package up the commits and any other supporting objects and send them over:
remote: Enumerating objects: 1400, done.
remote: Counting objects: 100% (679/679), done.
remote: Compressing objects: 100% (55/55), done.
Receiving objects: 100% (297/297), 104.28 KiB | 1.13 MiB/s, done.
The messages starting with remote:
came from some software on the remote (in this case, supporting Git software) as it packed up objects. The messages not starting with remote:
are from your own Git software. At the end of this process, I now have 297 new objects, some of which were commits, in my objects database. (This is a clone of the Git repository for Git, which I updated quite recently, but it's pretty active.)
Now that your objects repository is updated, your Git will take any of their branch names that have changed since the last update, rename them to remote-tracking names as usual, and update your names database with these updated remote-tracking names:
From [url]
+ 3ecb51dcfc...6839d98bf9 seen -> origin/seen (forced update)
In this case, seen
was their branch name; origin/seen
is my remote-tracking name. (I'll skip over the forced update
part here.)
In summary, then:
git fetch
gets, from their Git, new commits and other objects, by name.
git fetch
brings these over and puts them in your objects database, then updates tags and/or remote-tracking names, by default.
So: git fetch
updates your Git from their Git, but does not do anything to any of your branch names. (That is, not by default, and not with anything I'm going to show here.) You get your remote-tracking names resynchronized with their branch names. You get any new commits they have, that you don't. And that's it: the fetch is all done.
If you have more than one remote, you can run git fetch remote1
and then git fetch remote2
and so on, to get new commits from remote1
and update remote1/*
, and then to get commits from remote2
and update remote2/*
, and so on. Most people have only one remote anyway though, so one git fetch
, with nothing else listed, suffices.
If the origin
repository is very large and/or very busy, and your network connection is very slow and you need something right away, you might want to limit which of their branches you synchronize. In this case, you would run:
git fetch origin branch-A
to update your origin/branch-A
and nothing else, or:
git fetch origin branch-A branch-B
to update your origin/branch-A
and your origin/branch-B
, but nothing else.
This uses what are called positional arguments: the word origin
, the remote, goes in the left-hand position, and then everything after it is one of their branch names. You cannot leave out the remote because git fetch branch-A branch-B
means using branch-A
as a remote, fetch just the one branch branch-B
, which probably doesn't work anyway.
As it turns out, a branch name, as we're typing them in right here, works just fine as a refspec. So this also answers the what's a refspec question, at least in part:4 branch names work just fine as refspecs here.5
2Git may be growing a facility for changing this default first remote name, along with a facility for changing the default first branch name from master
to main
or whatever, so someday, this might be more complicated.
3The main magic here is that most people only have one remote anyway. The secondary magic is that Git will read the remote name from the current branch name.
4Technically a refspec consists of:
- an optional leading
+
sign; then
- two refs separated by a single colon
:
, or one refs.
A ref is a fully-spelled-out name, like refs/heads/branch-A
or refs/remotes/origin/branch-A
. We're not going to use either the leading +
sign or the pair-of-refs thing here, as +
means "force" and careless use of full refspecs can mean "wreck my branch names". Accidents here are often recoverable, but we don't want to get into a situation where they can occur in the first place.
5In very old versions of Git, predating 1.8.4, there's an issue with refspecs and git fetch
, in that Git omits what Git calls opportunistic updates of remote-tracking names. But surely you're using a modern version. (You can check with git --version
if you like.)
git pull
runs git fetch
The reason you really need to understand git fetch
before you get into git pull
is that git pull
means:
- run
git fetch
; then
- based on what happens in step 1, run another Git command.
It's the git fetch
step that gets new commits from some other Git, puts them in your repository, and updates your remote-tracking names. It's crucial to understand this before we move on. The reason we move on to run a second command is simple, though: as we just learned, git fetch
does not affect any of your branches at all. (At least, not the way we use it here.)
Presumably one of the reasons you're going to some other Git to get new commits from them is that you'd like to use those new commits. And, presumably at least, you then want to update some local branch(es).6 This is where that second Git command—that step #2—comes in.
The git pull
command offers you your choice of two primary second commands. The two commands people use here are git merge
and git rebase
. You need to know which one git pull
will run. You can control this with the options I mentioned:
--rebase
tells git pull
to run git rebase
second, and:
--no-rebase
tells git pull
to run git merge
second.7 The default, if you don't pick one of these two, is something you can configure. If you haven't configured it, the default default, as it were, is to use git merge
.
The thing is, you now need to learn about both rebase and merge. So let's touch on them lightly. Both commands can get very complicated, and I cannot cover them thoroughly here.
6It's possible to work Git with no local branches. In some situations this might even be advisable. But it's not how most people actually use Git.
7You might wonder why this isn't --merge
. It probably should be, but the reason it isn't is historical: originally git pull
only ran git merge
. So, once Git learned to be able to run rebase second, that was the --rebase
option. Then people wanted a way to make rebase the default. So that got added to Git too, and then people wanted a way to override their rebase default, so that became --no-rebase
... and that is where we are today, October 2021.
git merge
The merge command and the merging system is quite central to Git. Git is a distributed version control system, where a version control system lets us keep multiple versions of various files. Since Git is a distributed one, that means many people may be keeping many versions of many files. At some points, now and then, we need to combine work that different people did.
The merge system, build around a merge engine, is all about doing this: combining work. The git merge
command invokes the merge engine sometimes. It's not the only way to invoke the merge engine, and it doesn't always use it, but when it does use the merge engine, it uses it in the simplest, or at least easiest-to-describe, way.
Without covering all the important details, we'll start with a rough illustration of a series of commits. Remember that the actual "names" of commits are big ugly hash IDs—each commit gets a unique hash ID, never to be re-used8—so we'll use uppercase letters like H
to stand in for some H
ash ID. We'll draw a few commits with earlier commits on the left, and later commits on the right:
I--J <-- feature (HEAD)
/
...--G--H
\
K--L <-- alice/feature
The names here, alice/feature
and feature
, are the names we see in our own Git repository. We made a remote named alice
and used it to get commits from Alice. She made her commits, K
and L
, on her branch named feature
, at the same time we were making our commits I
and J
, on our branch feature
. The HEAD
in parentheses here is how we draw the fact that we're on our branch feature
, hence using commit J
.
If we now run git merge alice/feature
, our Git will figure out what we both started with—that's the set of files saved forever in commit H
—and from that, what changes we made in which files when we made our commits I
and J
. Git will, separately, figure out what changes Alice made in her commits K
and L
.
The merge engine will now do its best to combine our changes and Alice's changes. If Git's merge engine succeeds at doing this combining—or even thinks it succeeds; Git isn't smart, and is just following a bunch of simple text rules here—Git will then use the combined changes, as applied to the snapshot saved in commit H
, to make a new snapshot:
I--J
/ \
...--G--H M <-- feature (HEAD)
\ /
K--L <-- alice/feature
This new snapshot goes into a new merge commit M
, which causes our current branch feature
to advance one commit, the same way any new commit we make, makes our current branch advance one commit.
Meanwhile Git's successful (presumably—we'd best test this) combining of our changes and Alice's changes has (presumably) produced something better than either of our features on its own, so we should send commit M
to Alice so she can use it. That's where git push
will come in, but we won't cover that here.
Some merges, though, are trivially easy. Suppose we and Alice start with the same set of commits:
...--G--H <-- feature (HEAD), alice/feature
We're using commit H
, and so is Alice. Now Alice makes her two new commits (which we'll call I
and J
this time). We run git fetch alice
and get them:
...--G--H <-- feature (HEAD)
\
I--J <-- alice/feature
If we now run git merge alice/feature
, our Git says to itself: Huh, look at that, we don't have any commits of our own... why, we can just switch straight to Alice's commit! So it does that, giving us:
...--G--H
\
I--J <-- feature (HEAD), alice/feature
(There's no reason to draw in the kink in the graph any more either, so we can put this all on one line, if we like. I didn't bother here.)
Git call this operation a fast-forward merge. Running git merge --ff-only
tells Git: If you can do it as a fast-forward, do a merge. If not, give me an error.
Fast-forward merges have one big advantage: they don't create a new commit. They have one big disadvantage too: they don't create a new commit. Sometimes you want that, and sometimes you don't. (There's no room in this answer to say when.) You can force Git one way or the other, with --ff-only
or --no-ff
.
8This is mathematically impossible. The size and ugliness of hash IDs is the way it is to put doomsday as far as possible into the future. For various reasons, even Git's current hash ID size, which was intended to be good for at least thousands of years, is getting to be too small, and hash IDs will be getting even bigger and uglier someday.
git rebase
If you have no commits of your own, and you get new commits from Alice, you can use git merge --ff-only
to avoid making new merge commits (if you're one of those people, or in one of those situations, where you don't want merge commits). But what if you and Alice both do have new commits?
I--J <-- feature (HEAD)
/
...--G--H
\
K--L <-- alice/feature
You can, if you like, use git rebase
to copy your old I-J
commits to new and supposedly-improved ones. That's what rebase is really all about: it copies some set of existing commits to new (and supposedly improved) new commits. This can get very fancy, but when it works automatically and is used in the simplest way, the result looks like this:
I--J [abandoned]
/
...--G--H
\
K--L <-- alice/feature
\
I'-J' <-- feature (HEAD)
Note how the new (and improved, if they're actually improved) commits I'
and J'
extend not from commit H
but rather from commit L
, Alice's latest. That is, we used to both add on to commit H
. But now we add on to commit L
, so that Alice's commits remain exactly as-is.
This process really does abandon our original two commits: commits are completely read only once they're made, so it is impossible to change the original two. But what if commits I
and J
have two big ugly hash IDs that nobody knows, and their improved replacements I'
and J'
have two different big ugly hash IDs that nobody cares about, and everyone—which really means "just us", probably—finds these two commits using the name feature
in our own Git repository? Who's even going to know that we ever did this?
Nobody, that's who. Well, we might remember. Or not! It's easy to forget that you've done a rebase, or even multiple rebases. That's kind of the point here. We rebase to replace our existing commits with new-and-improved ones.
The actual mechanics of copying an old commit to a new-and-improved one involve using git cherry-pick
or similar, and that in turn uses Git's merge engine. So if we rebase two commits, that's like running git merge
twice, in a way. If we rebase ten commits, that's like ten merges! The number of "merge" operations, or at least pseudo-merge operations, grows linearly with the number commits we have to copy. Each step can have a merge conflict. So it's not a good idea to just keep rebasing forever: eventually you and Alice should get together and agree that some of your commits go into her repository too, or you start merging, or something. But sometimes, rebase is just what you want.
git pull
Now, at last, we can cover git pull
properly. It:
- runs
git fetch
; then
- runs a second Git command: you pick, rebase or merge, before you even get a chance to see what step 1 does.
Having somehow, in advance, picked which command you want, you then run:
git pull [<options>] [<repository> [refspec]]
Note that I've left out the more-than-one-refspec option, and I'll explain why in a moment.
You've already configured, with git config
or whatever, your default for whether git pull
should run rebase or merge second. If you haven't, you've chosen merge
by default. If you definitely want rebase this time, add --rebase
here to your options. If you definitely want merge this time, add --no-rebase
here. If you want your default, leave both out.
Add any additional options to pass to rebase or merge. We haven't covered any rebase options here, but we have mentioned two merge options: --ff-only
and --no-ff
. If you definitely want a fast-forward merge if fast-forwarding is possible:
git pull --no-rebase --ff-only
will make sure that happens. If a fast-forward isn't possible, the second command—the git merge
step—will simply fail and you can choose what to do then. If you definitely want a merge even if fast-forwarding is possible:
git pull --no-rebase --no-ff
will make sure that happens.
If you want or need to add a refspec, you must now first stick in a remote. That's because the remote and any refspecs you provide will be passed to git fetch
. The fetch command treats these as positional arguments, so you can't:
git pull branch-X # error
You have to:
git pull origin branch-X # OK
The pull
command will pass both arguments to git fetch
, so you'll call up the remote named origin
and get from them just any update they have to branch-X
.
Now here's why you should use only one refspec, i.e., just one branch name: if you choose merge, git pull
will take the remote
argument out and pass remaining arguments through (changing branch names to remote-tracking names first of course).9 We didn't cover this above, but:
git merge alice/feature1 alice/feature2 alice/feature3
performs what Git calls an octopus merge, merging all three of Alice's commits, found via these three remote-tracking names, into your current branch. If you don't know what an octopus merge is, you definitely don't want one. (Later, once you learn what one is, you probably still won't want one.10)
If you're using rebase, the rebase operation can't take extra remote-tracking names (or technically hash IDs) like this, so you can't run this kind of rebase. That is, git pull --rebase alice feature1 feature2 feature3
is automatically an error.
9Technically, what git pull
passes to git merge
here are raw hash IDs. Or, in the latest versions of Git, where git pull
has been rewritten as C code, this is done with direct function calls, rather than literal text strings. It works out the same though.
10They don't do anything you can't do without them. In fact, at least as I see it, their value—such as it is—is that a regular merge is more powerful than an octopus merge, so that if someone spots an octopus merge in a repository, they can assume that it was a set of simple merges. Even this can be abused though.
Conclusion
The things to take away here are:
git pull
means "run fetch, then run a second Git command". The fetch
step takes the remote and the refspec; the second command uses the commit that the fetch step found.
- The second Git command is one of
git merge
or git rebase
.11 This affects the current branch only, not any other branch.
- You do not want an octopus merge, so do not use
git pull remote branch1 branch2 branch3 ...
. The result is not what you wanted.
- If you omit the
remote
, the default is as for git fetch
.
- If you omit the
refspec
, the default is the upstream setting for the current branch.
We have not covered the upstream setting. See Why do I have to "git push --set-upstream origin <branch>"? and Why do I need to do `--set-upstream` all the time? You don't need to set an upstream, but once you do, it's convenient: among other things, it lets you run git pull
without having to type in a remote and a branch name.
(I personally never liked the "run two commands without letting me check stuff first" thing, and tend to use git fetch
and then a separate second command after I see what git fetch
fetched. However, recent Git has git config --global pull.ff only
, which makes git pull
do what I usually want, so I'm trying to adapt to that now.)
11There's a special case when you're on an unborn branch, but you won't care about this. In ancient Git, this case could ruin your week (I had that happen to me), but that's long fixed.