Apply Remote Commits to a Local Pull Request

Question

I have forked an active open source project that is on GitHub to my account. I then cloned my GH repo to my local machine. I do regular fetches of the upstream master branch to keep my local copy in sync.

I’m interested a pull request for a significant new feature on the upstream repo that is currently being reviewed and changed. I want to copy the pull request to my local machine so I can review and analyze this code. I will not be submitting any changes to this work.

I’ve fetched the pull request with this command and result:

$ git fetch upstream pull/5737/head:pr-5737 

remote: Counting objects: 4, done. 
remote: Total 4 (delta 3), reused 3 (delta 3), pack-reused 1 
Unpacking objects: 100% (4/4), done. 
From https://github.com/OrigProj/repo-name 
* [new ref]     refs/pull/5737/head -> pr-5737

The result is a new branch called ‘pr-5737’ for the pull request. The project devs have now made additional commits to the pull request as they work on it. I can get the new commits to the pull request down to my local machine with these commands:

$ git checkout pr-5737 
Switched to branch 'pr-5737' 

$ git fetch upstream pull/5737/head

remote: Counting objects: 4, done. 
remote: Total 4 (delta 3), reused 3 (delta 3), pack-reused 1 
Unpacking objects: 100% (4/4), done. 
From https://github.com/OrigProj/repo-name
* branch         refs/pull/5737/head -> FETCH_HEAD

I cannot figure out the command to ‘merge’ these newly fetched commits to the head of the pull request branch. I don’t want to merge the pull request into the master branch. I want to keep the separate branch, pr-5737, updated. What command do I use?

I know I can just delete the pull request branch, pr-5737, and re-fetch it but that doesn’t seem like the ‘correct’ way to do things.

Answer:

Cherry-Picking from the excellent explanation/answer by torek. And taking the easy path based on my current practice. I was able to add new commits from upstream to the pull request branch, pr-5737.

I used these commands:

$ git checkout master
$ git branch -f pr-5737 FETCH_HEAD

I then pushed everything to my GitHub fork of the upstream project. Mainly for reasons of a backup. You will also see I have added two other pull-request branches I wanted.

$ git push  -v --all origin
Pushing to git@github.com:username/fork-project.git
Enter passphrase for key '/home/acct-name/.ssh/id_ecdsa':                                                                                                                   
Counting objects: 68, done.                                                                                                                                             
Delta compression using up to 4 threads.                                                                                                                                
Compressing objects: 100% (68/68), done.                                                                                                                                
Writing objects: 100% (68/68), 11.38 KiB | 0 bytes/s, done.                                                                                                             
Total 68 (delta 50), reused 0 (delta 0)                                                                                                                                 
 remote: Resolving deltas: 100% (50/50), completed with 21 local objects.
To git@github.com:username/fork-project.git                                                                                                                                
 = [up to date]      pr-5717 -> pr-5717                                                                                                                                 
   ec0b073..67486dc  master -> master                                                                                                                                   
   212c54a..e935946  pr-5737 -> pr-5737                                                                                                                                 
 * [new branch]      pr-5763 -> pr-5763                                                                                                                                 
updating local tracking ref 'refs/remotes/origin/master'                                                                                                                
updating local tracking ref 'refs/remotes/origin/pr-5717'                                                                                                               
updating local tracking ref 'refs/remotes/origin/pr-5737'                                                                                                               
updating local tracking ref 'refs/remotes/origin/pr-5763'
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean

score 2 · Answer 1 · edited May 23 '17 at 10:29

You are doing things that are not "built in" to Git, so there is not necessarily a correct way. There are merely a lot of options. There is a way—or multiple ways, really—to deal with this that is built-in, so you might want to switch to that, but let's get there by starting with what you are doing.

Let's look first at what happens when you run git fetch. I'll assume here that you have two remotes, one named origin and the other named upstream. Here's what these remote names do for you:

Each remembers a URL.

The URL for origin and the URL for upstream are different, and do not even both need to be on GitHub (although since you mention GitHub forks, I assume both are). You get to type in a shorter name, though.
Each also provides a prefix for remote-tracking branch names. You now have both origin/master and upstream/master, for instance.

It's this second part that you can (in management-speak) "leverage" for the pull requests. But first, let's talk about branches, branch names, and name spaces.

Git branches vs Git branch names

The term branch, in Git, is ambiguous. It can mean a branch name like master or pr-5737, but it can also refer to a branch structure within the commit graph. Each commit has its own unique hash ID, and each commit also records the ID of a previous commit. For instance, in a new repository with just three commits, we might have:

A <- B <- C   <-- master

where the name master contains the ID of commit C, and C itself contains the ID of commit B. We say that master points to C, C points to B, and B points to A. Since A is the very first commit, there's nothing for it to point back to; we call this a root commit. These pointers all work backwards—the branch name finds the tip commit for us (and for Git), and the tip commit finds an earlier commit, which finds more commits, all the way back to the root.

Adding a new commit simply means writing a commit that points back to the current tip, and then updating the branch name to point to the new commit:

A <- B <- C <- D   <-- master

I generally draw these without the internal backwards arrows, which makes the drawing more compact and allows displaying branches:

A--B--C--D   <-- master
    \
     E--F    <-- feature

Note, again, that the branch names only point to the (two, in this case) tip commits. It's those commits that point back to the rest of the commits. These chains of commits are also Git branches. For more on this, see What exactly do we mean by "branch"? We use the term branch interchangeably, in Git, to mean both branch names and branch structures within the commit graph. It's usually obvious which one is meant—though if it's not, you should ask!

One key item here is that commits can be pointed-to by things that are names, but are not branch names. For instance, tags can also point to commits (and of course commits point to commits). There are very few restrictions on these names, and in fact, pull requests are just yet another category of names that point to commits. The general term for all of these is references: in Git, a reference is any name for a commit—usually the tip commit of a branch (the graph branch)—or any other Git object (but we'll ignore the three other types of Git objects here).¹

The other key item is that in normal (non-maintenance-command) usage, Git can only get started, in terms of finding commits in the commit graph, using these names. It needs a branch name, or tag name, or something, to find a starting ID. (You can give your Git a raw ID, possibly abbreviated, which is what you do if you run git log ac0ffee for instance. It starts from that commit and works backwards from there.)

¹Some parts of Git assert that all references start with refs/, but others note that there are special non-refs/ references, such as HEAD, ORIG_HEAD, CHERRY_PICK_HEAD, MERGE_HEAD, and so on. I would put it as most references start with refs/.

`git fetch` brings in commits

When you run git fetch, you have your Git call up another Git. That other Git has its own repository. Your Git uses the URL stored in under the remote name to contact the other Git, and then your Git and their Git have a little conversation. Their Git tells your Git about their branch tips—the names, and the commits—the hash IDs—identified by those names.

These hash IDs are computed, in a complicated (cryptographic) but fully deterministic way, from the contents of the commits. Any two commits that are 100% identical, bit-for-bit, have the same IDs. This means that any commits that your Git and their Git both have in both repositories, have the same IDs. So, your Git can tell if you already have their commits, or not.

If your Git doesn't have their commits, and you've told your Git to bring them in (by asking for that name, or all names), your Git then requests those commits from their Git. Their Git bundles them up and sends them over, and your Git stores them in your repository, using their unique hash IDs.

Fetch must now assign names

Git now needs some way to find these commits, now that it has these commits. This means that git fetch must save some names. Of course, all of the branch tips and other tip commits that your Git got from the other Git, had names over there. Why can't it just use those names?

Think about this for a moment and the reason becomes obvious. Suppose the name it got was "branch master", and it overwrote your master with this new value. You'd lose easy access to your own commits!²

The simplest way for git fetch to save these name/ID pairs is to write them all into a file named FETCH_HEAD. They will stay there, safe and sound, until the next git fetch overwrites them.

That's what you are doing with this (second) command:

$ git fetch upstream pull/5737/head
remote: Counting objects: 4, done. 
remote: Total 4 (delta 3), reused 3 (delta 3), pack-reused 1 
Unpacking objects: 100% (4/4), done. 
From https://github.com/OrigProj/repo-name
* branch         refs/pull/5737/head -> FETCH_HEAD

The remote: messages are coming from that other Git: it's packaged up four objects (probably one commit, one tree, and two blobs, though I am just guessing) into a thin pack with delta compression applied ("delta 3"). Your Git got the package and unpacked it. Your Git used one name from their Git—refs/pull/5737/head—just as you told it to. And, your Git did not store this under your own name, but merely in the FETCH_HEAD file.

If you like, you can now extract this commit's ID from the file FETCH_HEAD. You can do that by looking inside the file (the format will be reasonably obvious), or—since there's only one ID in it—just by using the name FETCH_HEAD. Just remember that the next fetch will overwrite the file, forgetting the ID. Once this forgetting happens, if the new commit you just got has no names that can find it, that makes the commit eligible for garbage collection: it will eventually get thrown away. But you have a chance, now, to give it your own name(s).

Let's compare, though, to your first command:

$ git fetch upstream pull/5737/head:pr-5737

Note the colon here, and that the output ended with:

* [new ref]     refs/pull/5737/head -> pr-5737

The earlier command did not say FETCH_HEAD. Your Git wrote the name, instead, to a new reference, pr-5737. This reference is in fact refs/heads/pr-5737, through some assumptions that git fetch makes.³ For now, let's just note that the "full name" of any branch is refs/heads/branch, e.g., refs/heads/master for the branch-name master.

This colon-separated form, by the way, is a refspec. A refspec is only slightly more or less than a pair of branch names: a source name, and a destination. With git fetch, the source is their name (branch or other reference), and the destination is your name. You may choose any kind of name, not just a branch name, for either side. For fetch, leaving out the destination name like this means you want to just write the information to FETCH_HEAD.⁴

²But note that this is (partly) how tags work. The idea of tags is to be global across all repository clones. The question then becomes whether and when your own tags will be overwritten by another Git's tags, and the answer to that is complicated (and not appropriate for this posting).

³Both git fetch and git push have some complicated code to qualify an unqualified reference. A name like master or branch or pr-5737 does not start with refs/, so it is unqualified. If you write out refs/pull/5737/head or similar, this does start with refs/ and is qualified, and does not go through this complicated bit of code. In a few cases, Git can't do the qualification on its own, or does it wrong, and makes you write out the full name. That's true for these pull names, for instance. Usually, though, it does pretty well at guessing whether something is a branch or a tag. In this case, it guesses that you meant to make a branch, which is probably correct.

⁴Since Git version 1.8.2, Git will do an opportunistic update of remote-tracking branches. We'll see more about this at the very end of this article.

Fast-forwarding and force-updating

I cannot figure out the command to ‘merge’ these newly fetched commits to the head of the pull request branch.

It's time to go back to the graph drawings.

Your first git fetch brought in some commits—probably just one, again—but it gave them a branch name in your repository, pr-5737. That commit itself points back to a previous commit, which I'll guess for now is one that is mainly or only on their (upstream's) branch master. The graph fragment is therefore:

...--o     <-- upstream/master
      \
       o   <-- pr-5737

Now, it may be that you have already updated your own master to match upstream/master. In that case, we should draw the graph this way:

...--o     <-- master, upstream/master
      \
       o   <-- pr-5737

Note that the commit graph is unchanged! All we did was change the labels a bit. That's the key to this process. The fetch will always get you the commits; what you want to do is to change (or perhaps add) labels, such as branch names, as you incorporate these commits into your own graph. The fetch step modifies—adds to—the commit graph, and after that, we need something to happen with names.

So, let's look at the graph after the second git fetch. First, let's assume this adds one more commit that points back to the previous pull-request commit:

...--o       <-- upstream/master
      \
       o     <-- pr-5737
        \
         o   <-- FETCH_HEAD

In this case, you probably want to move pr-5737 to point to the same commit as FETCH_HEAD. But, what if the new commit doesn't point back to the previous pull-request commit? What if instead, it points to the origin/master commit? Well, let's draw that:

...--o     <-- upstream/master
     |\
     | o   <-- pr-5737
     |
      \
       o   <-- FETCH_HEAD

Now you probably want to move pr-5737 to point to the new commit. (Or, maybe you don't want that: it's up to you to decide what you want. But I'll assume that you do want it.)

There are a bunch of Git commands that will move a branch label. The most user-oriented is git branch: with git branch --force you can re-set any branch that you don't currently have checked out. (To re-set the branch you do have checked-out, you need to use git reset instead, for a bunch of good technical reasons that amount to Git letting the implementation show through.) You could just run:

git branch -f pr-5737 FETCH_HEAD

to forcibly move the name pr-5737 to point to the commit git fetch just brought in.

(Again, if you have pr-5737 checked out at the moment, you have to use git reset instead, and then you must choose: --soft, --mixed, or --hard? These control whether the reset operation affects the index and work-tree. Let's just assume that you don't have it checked out. :-) )

Now, if the new commit we just brought in, the one at FETCH_HEAD, "adds to the branch"—i.e., the new commit(s) point(s) back to the tip of pr-5737—this branch label movement is what Git calls a fast-forward. Note how, in the first FETCH_HEAD drawing above, Git can kind of slide the label forward (rightward, while also going down) to the new commit:

...--o       <-- upstream/master
      \
       o
        \
         o   <-- pr-5737, FETCH_HEAD

With the second drawing, however, forcing pr-5737 to point to the new commit causes the old one to be forgotten! We have to back the label up one step, to point to the tip of upstream/master, and then back down and right. This is a non-fast-forward forced update.

If we use git branch -f, it will update pr-5737 even if it cannot be fast-forwarded. What if you only want to move pr-5737 if it's a fast-forward?

Merge can also fast-forward

You have no doubt seen Git print "Fast-forward" when merging. This is because if you are on one of your branches, and run git merge name, Git will check whether the commit it finds under the given name—usually, the tip of a branch—is "fast-forward ahead" of the tip commit of the current branch. If so, Git doesn't actually merge anything, it just slides the branch name forward (and checks out the new commit, so that your index and work-tree match the new branch tip).

If you use the git merge command, you can limit it to working only if the merge will really be a fast-forward instead, using --ff-only. (And, you can force it not to do a fast-forward at all, but rather make a new merge commit, using --no-ff.) So, you could git checkout pr-5737 and then git merge --ff-only FETCH_HEAD to make the fast-forward happen. Of course, this could fail, as it would for the second case. Then you have to decide what you want to do.

You probably don't want to merge these two commits. (If you really do, for whatever reason, you can: just run git merge FETCH_HEAD. That's probably not useful though.) You probably just want to force the branch to move and to load the new tip commit into your index and work-tree, in which case, you can git reset --hard FETCH_HEAD. If you go this way, though, you'll know—based on whether git merge --ff-only worked—whether the updated pull request is a replacement, or an add-on.

Doing it the easy way: `git fetch` can do it for you

In your original git fetch you told your Git to write the name pr-5737:

$ git fetch upstream pull/5737/head:pr-5737

You can use this exact same command again. Your Git will obtain any new commit(s), and then try to update your existing refs/heads/pr-5737.

As before, this could be a fast-forward. In this case, your Git will do the update. (You still get a FETCH_HEAD file but you don't need it any more.) Or, it could be a non-fast-forward. In this case, your Git will error-out:

 ! [rejected]   refs/heads/pr-5737 -> pr-5737 (non-fast-forward)

To force the update, we use one more feature of a refspec: it can start with a plus sign, which means "force". So:

git fetch upstream +pull/5737/head:pr-5737

This time, you'll get the update, with the annotation (forced update) added. The annotation is only added if the update is actually forced, so as with doing a manual git merge --ff-only, you will know whether the update had to be forced.

The really easy, fully automated way

Now, it might be nice if you could get git fetch to do this update without having to type:

git fetch upstream +pull/5737/head:pr-5737

all the time. And there is—and in fact, you can bring in any and all pull requests that have the form refs/pull/NNNN/head, or any other form you care to recognize. It's up to you to decide how to bring them in, but before we dive into the mechanism, let's mention name spaces, and the role of a remote name in remote-tracking branch names.

A name space (or namespace as a single word) is an organization, typically hierarchical, where different groups of names are, well, grouped. In Git's case, for instance, most references are under refs/, but all branch names are under refs/heads/, as we saw earlier. All tags are under refs/tags/. These names work like directories (and are actually implemented as such, in some cases). The space beginning with refs/remotes/ holds all remote-tracking branch names, but it's further subdivided: there is one space for the remote named origin, under refs/remotes/origin/, and another for upstream, under refs/remotes/upstream/.

By sub-dividing the remote-tracking branches, and separating them from your regular branches, Git guarantees that it will never use, as a remote-tracking branch name, any of your own branch names. Your own branches all start with refs/heads/, and refs/remotes/origin/ does not start with refs/heads/, so these names are separate. Moreover, by including the name of the remote, Git tries to guarantee that these also never collide: refs/remotes/origin/ is always different from refs/remotes/upstream.⁵

If you allow pull requests, of the form pr-number, to occupy the same name space as your branches, and if you name one of your own branches, say, pr-123, you can get a collision. So don't do that: either make sure you never name your branches like this, or pick your own name space for your pull-request-trackers. You may want to stick with branch names since Git only has three built-in forms it recognizes, for branches, tags, and remote-tracking branches; so branch names are shorter to type. (That's why you had to spell out pull/5737/head rather than just 5737/head: the full name is refs/pull/5737/head, and Git can find this under refs, but not without the pull part. Your master's full name is refs/heads/master, but you don't have to type in heads/master.)

(For reasons I will mention in a moment, you might want to spell your dedicated pull-request sub-branch name space pr/* instead of pr-*. I'll assume from now on, you want pr/5737 instead of pr-5737.)

If you open your .git/config file in your editor—note that you can do this with git config --edit, so Git kind of encourages this; it's relatively safe as long as your editor does not try to convert the configuration to rich text or something equally silly—you will see a configuration section for each remote:

[remote "origin"]
    url = ...
    fetch = +refs/heads/*:refs/remotes/origin/*
[remote "upstream"]
    url = ...
    fetch = +refs/heads/*:refs/remotes/upstream/*

The url lines provide the saved URLs—they are how your Git knows how to call up the other Git. The more interesting lines, for us, are the fetch ones. These provide the default fetch refspecs.

If you run git fetch origin, that means

git fetch origin +refs/heads/*:refs/remotes/origin/*

This is how Git implements the remote-tracking branches: a git fetch just obtains the refspecs you gave it, even if those are the ones implied by the per-remote fetch default configuration. Git also allows more than one fetch = line, and it adds all of them as the default set of refspecs.

This shows that refspecs can also do a kind of wild-card matching. This matching is a limited form of shell style glob match. This means you can add a second fetch = line that reads:

+refs/pull/*/head:refs/heads/pr/*

Now, if the remote has a reference named refs/pull/5737/head when you run git fetch, your Git will create or update—forcibly if needed—your own branch pr/5737.

If your Git is new enough, you can use one * glob pretty much anywhere, e.g., the rather peculiar:

+refs/pull/5*/head:refs/heads/pr-5*

which will obtain only pull requests starting with 5, updating your own branch name starting with pr-5. But in versions of Git before 2.6.0 (commit cd377f4), the * had to match a whole component, e.g., pull/*/head but not pull/5*/head, or pr/* but not pr-*. If your Git is 2.6.0 or later, you can fetch to pr-*, but if not, you must fetch to pr/*.

(Since Git version 1.8.2, if Git is fetching a reference, and it has one of these matching fetch = lines, your Git will update the corresponding remote-tracking branch, automatically, even if you gave some refspecs on the command line. This does not occur in older versions of Git. But even in those older versions of Git, if you succeed at a git push that pushes something matching one of these refspecs, Git will opportunistically update the remote-tracking branch. The fact that push did it was what finally convinced the Git folks that fetch should do it too.)

⁵This attempt fails in subtle ways if you name one remote a and another a/b. Git should forbid slashes in remote names, but it doesn't. (So, don't use slashes in remote names—or if you do, make sure none is ever a prefix of another.)

There are more options

You don't have to make the pull requests into (regular) branches. You could add them as remote-tracking branches, for instance, using your own invented pr/ sub-name-space:

[remote "upstream"]
    +refs/pull/*:+refs/remotes/pr/upstream/*

This will turn their pull/5737/head into your pr/upstream/5737/head remote-tracking branch. You can now choose whether to git checkout 5737/head to create your own local branch named 5737/head that has the remote-tracking branch pr/upstream/5737/head (this is for a "remote" named pr/upstream—which you don't have, but that's fine; though you will have to be sure not to name a new remote "pr")—as its upstream. (That is, 5737/head@{u} will name pr/upstream/5737/head.)

The obvious drawback is that the name is a bit clunky. A less obvious one is that if you collect pull requests from multiple remotes, 5737/head might match both pr/upstream/5737/head and pr/another/5737/head remote-tracking branches, if both remotes have a pull request #5737 outstanding. In this case the git checkout DWIM feature, that knows how to create local branches based on remote-tracking branches, will fail: it won't pick one arbitrarily for you.

There's also no clear advantage to this. You get your own branch, so you can make your own commits—but why would you want to? The drawback to force-fetching into your own branch space, using the earlier scheme without remote-tracking branches, is that you might clobber your own commits if you forget that pr/5737 is set up this way and make some commit there that you wanted to keep. (But even then, your pr/5737 reflog will preserve your commit for 30 days by default.)

Hence, I am not sure why you might want this—but it's an option. The fetch = ... mechanism is just that: a mechanism, not a policy. It's up to you how to use it.

A very helpful and understandable. You greatly improved my understanding of my situation. The explanation about namespaces and name collisions will be good for the future. Still need to do back over it again to fully absorb. I'll likely create another clone repo and try out this method too. For now, this pull request has nine new commits to the PR branch. I want to fetch them. Then ensuring I'm not on the PR branch, force the branch name to move to the last commit (?) of FETCH_HEAD with these commands: `git checkout master` and `git branch -f pr-5737 FETCH-HEAD` Will try tomorrow. — deadDrift, Feb 23 '17 at 02:01
The last command has a typo. It should be: `git branch -f pr-5737 FETCH_HEAD` . It appears this did indeed do the job I was expecting. The new commits to the PR were added to the pr-5737 (it moved the label to the tip). Was also able to push everything to my GH fork. Thanks! — deadDrift, Feb 23 '17 at 20:45