0

I created a pull request, but I was working on the wrong code. So now I want to just create a brand new pull request, starting over so to speak.

I closed the erroneous PR, but do I just go ahead make a new change to the file and commit and push again?

Daniel
  • 14,004
  • 16
  • 96
  • 156
  • So, replace the PR with a different branch altogether? – eftshift0 Jul 02 '21 at 17:17
  • @eftshift0, its going to be the same branch. Just need a new PR. – Daniel Jul 02 '21 at 17:19
  • 3
    I think the general answer is, if you need to change the source branch, start a new PR. If you're keeping the same source branch it's up to you if you want a new PR, but you can use the same PR if you want to. So, just rewrite your branch (complete reset, or fancy rebase, or cherry-pick range, or rebase -i, or amend), or add new correction commits. Which you choose depends on the preferred workflow for that repo. – TTT Jul 02 '21 at 17:25
  • 1
    If you already are doing a new PR, then it doesn't matter what you do, right? Just make your branch look like whatever you want it to look like and then push it. – TTT Jul 02 '21 at 17:27
  • @TTT, No I had to destroy my old branch because it had the wrong work in it. I had to destroy the branch in remote and local and create a new one, add the correct work to it and push – Daniel Jul 02 '21 at 19:09
  • 1
    @Daniel You didn't *have* to destroy it, but that's OK. Deleting your local branch and re-creating it with different commits, is identical to `git reset --hard [some-commit-id-or-branch-name-or-tag-name]` and then add whatever commits you want to it. The reset just saves you a few steps. (So you don't have to checkout some other branch, delete the old one, then re-create the new one and check it out again). When you're done, if you didn't bother to delete the remote branch yet, then you simply force push instead of push. ;) – TTT Jul 02 '21 at 19:16

1 Answers1

0

I closed the erroneous PR, but do I just go ahead make a new change to the file and commit and push again?

Generally, yes. But, as you've seen in comments, there are some complications.

Long: everything you need to know about GitHub PRs

There are several things to understand here. These come under two general topics:

  • Git doesn't care about branches. Git only cares about commits.
  • Git does not have Pull Requests. PRs are an add-on, provided by various web hosting sites, becaues they're useful in terms of making certain operations easy ("one click" web thingies for instance). As a result, the specific details for updating or replacing a pull request vary somewhat between different hosting sites.

Still, the fact that Git itself only really cares about commits has a certain leak-over effect into these per-hosting-site PR mechanisms. So the two topics intertwine.

How Git uses branch names to find commits

Let's start with the Git command line. We run git log or git log --oneline, or maybe git log --all --decorate --oneline --graph ("Git Log with A. D.O.G."; see Pretty Git branch graphs). Git spills out stuff like the following:

* 670b81a890 (HEAD -> master, origin/next, origin/master, origin/HEAD) The second batch
*   98f3f03bcb Merge branch 'fc/doc-build-cleanup'
|\  
| * 7ba3016729 doc: avoid using rm directly
| * db10fc6c09 doc: simplify Makefile using .DELETE_ON_ERROR
| * 471e7b2cf6 doc: remove unnecessary rm instances
| * 56da21392b doc: improve asciidoc dependencies
| * 12d078ed2b doc: refactor common asciidoc dependencies
* |   2019256717 Merge branch 'ab/test-lib-updates'
|\ \  
| * | f0d4d398e2 test-lib: split up and deprecate test_create_repo()

Each of these stars, along with its connecting lines, represents one commit. Each commit itself is numbered: that big ugly hash ID, here trimmed to a mere 10 characters—each one is 40 characters long—is a unique number that means that commit, and only ever that particular commit. (These particular commits are in a clone of the Git repository for Git.)

Each commit, which Git finds by its unique hash ID, stores two things:

  • A commit stores some metadata, including the name and email address of its author, for instance. The metadata tell us who made the commit, when, and—at least if the author is conscientiouswhy they made that particular commit. But it also stores the hash ID of some earlier commit, or commits.

  • Meanwhile, each commit also stores a full snapshot of every file. That's where Git gets the files that it will put into your working tree, if and when you select that commit to be extracted.

By comparing any two snapshots—often, two adjacent ones such as 7ba3016729 and db10fc6c09 for instance—Git can tell you which files changed between those two snapshots, and show you the exact changed lines. But to do this, Git has to find the hash IDs of the two commits.

You can give these to Git directly yourself:

git diff db10fc6c09 7ba3016729

for instance will have Git extract, into a temporary (in-memory) area, these two commits, and compare their snapshots. The result is that we see that Documentation/Makefile changed. Or, we can give Git—or perhaps GitHub—the full hash ID of the later of this particular pair of commits, and Git will automatically find the parent for us and compare the two commits (try the GitHub link here, and note that it has embedded in it just the later commit hash ID).

Git finds an earlier commit's hash ID using the metadata in the later commit. By giving Git—or GitHub—the full raw hash ID 7ba30167291eb89f2e587b7cabfa4e7555de4ed5, Git can start at that commit and work backwards. The commit's parent, db10fc6c09f1f74c4d0a9294ecbb68d390f54f15, is a commit too, so it has metadata, which gives yet another parent hash ID. That is yet another commit, so it has metadata, and so on.

Using this metadata, Git can work backwards through the history in a repository. The commits are the history, via the snapshots and metadata. But there's one big hitch: where do we get the hash ID of the last commit in some string of commits? The answer is, typically, from a branch name like master.

The very first line of the git log output I quoted above begins with:

* 670b81a890 (HEAD -> master, origin/next, origin/master, origin/HEAD)

The stuff in the parentheses here are what Git calls the decorations, from git log --decorate. This --decorate flag is the default now and has been for quite some time, but if you have an ancient version of Git, you may still have to use an explicit --decorate. What it does is have Git look at all your branch names, all your remote-tracking names, and all your tag names and other such names. Each of these names stores one hash ID. In my particular case, three of my Git repository's names—master, origin/master, and origin/next—all store hash ID 670b81a890388c60b7032a4f5b879f2ece8c4558.

When I run git log --decorate --oneline --graph and don't tell Git where to start the log operation, what Git does is this:

  1. Look up the name HEAD. This particular name contains another branch name:

    $ cat .git/HEAD
    ref: refs/heads/master
    
  2. As a result, look up the branch name master (full name refs/heads/master). This contains 670b81a890388c60b7032a4f5b879f2ece8c4558: the hash ID of the first commit to be printed.

So that's where git log starts. It uses hash ID 670b81a890388c60b7032a4f5b879f2ece8c4558 to find a commit, specifically this one. That commit has one parent, namely 98f3f03bcbf4e0eda498f0a0c01d9bd90de9e106. That commit is a merge commit, with two parents instead of the usual one; this makes git log's job harder, but what it does is to go on and display both of those commits (eventually), and their parents, and so on.

In other words, Git used the branch name—master, in my case—to find the last commit in the branch. Then it used that last commit to find the second-to-last commit. Then it used that second-to-last commit to find yet more commits, from which it found still more commits, and so on. If I didn't stop it, git log --decorate --oneline --graph would go on to list 63272 commits (at the moment).

Let's reduce all of the above to a simple drawing

The actual hash IDs of commits are big, ugly, and random-looking. To make a simple drawing, let's replace the hash IDs with single uppercase letters. This wouldn't work in a real repository because we would run out of letters way too fast, but it's nice for a drawing:

...--F--G--H  <-- master (HEAD)
         \
          I--J   <-- feature

Here, we're on our master branch. The name master holds the hash ID of the latest master commit. That's commit H. Commit H points backwards to earlier commit G, which points backwards to another still-earlier commit F.

If we run git checkout feature or git switch feature, we get this:

...--F--G--H  <-- master
         \
          I--J   <-- feature (HEAD)

The name feature holds the hash ID of commit J, which is the latest feature commit. Commit J points backwards to earlier commit I, which points backwards to earlier commit G.

Note that commit G, and all earlier commits, are on both branches. This might be clearer if we draw the commits as:

          H   <-- master
         /
...--F--G
         \
          I--J   <-- feature

It's important to realize that these are the same drawings, even if they look a bit different. The commits that are "on" some branch are those we can get to by starting from the most recent commit, found by using the branch name, and working backwards.

When we git checkout or git switch to a branch and make a new commit, the new commit automatically extends the branch, like this:

          H--K   <-- master (HEAD)
         /
...--F--G
         \
          I--J   <-- feature

Here, we made a new commit on master, creating commit K. New commit K points back to existing commit H.

Sending commits from one Git repository to another

When we work with Git, we're working with a distributed system. Each clone of some repository has all the commits—or more precisely, all the commits it has. In particular, if we clone some repository:

git clone ssh://git@github.com/path/to/repo.git

and then someone adds new commits to the GitHub repository, we don't have those commits yet. Or, if we add new commits to our repository, the GitHub repository doesn't have our new commits yet.

To fix this, we need to be able to get commits from some other Git repository—such as the one over on GitHub—or send commits to some other Git repository (the GitHub one, again). This is where git fetch and git push come in.

Without going into all the gory details (there are many), what these two commands do are:

  • have one Git repository call up another one;
  • pick the caller as sender and the callee as receiver (git push), or the caller as receiver and the callee as sender (git fetch); and
  • figure out which commits the sender should send and the receiver should add to their collection.

Git does this by the commit hash IDs. The hash IDs are unique: no two different commits ever use the same ID, and if two commits have the same ID, they are the same commit. In a sense, the ID is the commit. This is why commits must not change, and Git uses an internal consistency check to make sure that commits never do change. So whoever is the sender just says: I have commit ________ (fill in the blank with a hash ID). Whoever is the receiver replies with Oh I don't have that, send it or No thanks, I have that already. If the receiver needs the commit—i.e., does not have it—the sender is obligated to offer its parent commit(s) as well, and the receiver looks to see if it has those commits too, and replies as before.

In this way, the sender sends, to the receiver, all the commits that the sender has, that the receiver lacks, that the receiver will need. (The sender can choose which commits to offer at all, as is the case during git push, or just list out all of its branch and other names and the hash IDs, as the usual case during git fetch.) The receiver takes the new-to-it commits and stores them in its big database of all commits and other supporting objects.1

Having done all this, though, there's now a problem: Git finds commits by using names, such as branch names. If the receiving Git has new-to-it commits, it almost certainly needs to update some name or names.

Here, git fetch and git push work differently:

  • With git fetch, the receiver normally updates remote-tracking names, such as origin/master and origin/next. These are names that your own local Git dedicates, just for commits obtained from the Git you're calling origin. These are not branch names, not in your own Git repository anyway. These are commits that your Git saw that their Git found by their branch names. They are using their master to find some commit, so your Git sets your origin/master to find that same commit.

    If you want, you can, at this point, update your own master to find the same commit. (I do this with this particular Git repository for Git myself—this is not the one I generally work in, it's mostly a mirror I update a bit lazily.) That's not part of git fetch though: that's a second step.

  • With git push, though, the sender usually tells the receiver: Please, if it's OK, update one of your branch names to remember this particular commit hash ID. That is, you, on your laptop perhaps, add new commits to your own local repository, using some branch name. Then, using that same branch name, you have your Git send your new commits to your GitHub repository, and then ask your GitHub repository to set this same branch name to find the same commit.

At this point, we need to introduce the concept of fast-forwarding and forced updates. A typical git push ends with a polite request: Please, if it's OK, update your branch name _______ to hold hash ID _______ (fill in both blanks). But a forced git push ends with a command: Update your branch name _______ to hold hash ID _______!


1There's also a vetting process, where the receiver can first inspect the commits and other data before deciding whether to accept them. We'll ignore this complication here.


Fast-forward operations

The polite request is the same as the forced-push command, but says if it's OK. What makes it OK? Let's go back to our graph drawings.

Suppose that we have, in our repository, these commits:

...--F--G--H   <-- master (HEAD)

Suppose that the GitHub copy of this same repository has the same set of commits, and the same name, master, selecting commit H.

We make one new commit I that points back to existing commit H, so that it adds on to the branch:

...--F--G--H--I   <-- master (HEAD)

If we now run git push origin master, we'll have our Git call up their Git and send them new commit I (it's new: we just made it, they can't possibly have it yet). Then we'll ask them to set their master to select commit I.

If they do that, their master will end with commit I, whose parent is H. They still find all the same commits they found before; they've just added some on to the end.

But suppose that, instead of making a new commit I that points back to H, we sneakily, somehow,2 remove commit H from our own master:

          H   [abandoned]
         /
...--F--G   <-- master (HEAD)

Then we make our own new commit I to be used instead of commit H:

          H   [abandoned]
         /
...--F--G--I   <-- master (HEAD)

If we now send commit I to our GitHub copy, and ask GitHub to set our GitHub repo's master to point to commit I, they will say no! The reason they'll say no is that I doesn't just add on to the existing commits. If they switch their master to point to commit I, they'll lose commit H entirely.

There are two ways we can convince them to change their master to point to I anyway. One way is to use the --force flag. That changes the Please, if it's OK polite request into the forceful command. GitHub will probably obey this command, as long as we own the repository.3

The other trick we can use is to delete the branch name entirely, or, if there's some administrative way to do this,4 to rename the branch name, so that the name master is freed up to use to point to commit I. Then, having gotten the old name master out of the way, we can create a new master, which we can point to whatever commit we want.

Since branch names normally move—to add more commits to the end of the branch—Git has a word for this, or maybe two words: fast-forward. When new commits just add on to a branch, that branch-name motion is a fast-forward operation. When the branch name can't just be "slid forward", though—as when we have master back up from H to G before moving forward to I—that's a non-fast-forward operation.


2We probably did this with git reset --hard HEAD^ or similar, but we might use git commit --amend to do it all at once.

3Using GitHub's notion of a protected branch, we can make GitHub refuse to update our own commands. If we do that, we will have to go in to GitHub using their web administration interface and de-protect the branch name master long enough for us to get rid of bad commit H. We can re-protect the name afterwards, or just give ourselves administrative privileges to override the protection, or whatever we want to do, but the point here is that we have to use GitHub's web interface to override the protections we set with GitHub's web interface.

None of this is a Git operation. Git doesn't have the notion of protected branches at all. This is all stuff that GitHub added on. What this means in practical terms is that if we decide to move everything to Bitbucket or GitLab, we'll have to change how we administer this, assuming that Bitbucket and GitLab even have the same add-on ideas.

4Note that to rename a branch in our own repository on our laptop, we use the command-line git branch -m command. There's no git push option to rename a branch; there's just the one to delete it. So git push --delete can delete the GitHub branch, provided it's not protected from deletion. But renaming would require a web interface page.


Summary so far

In general, what you do is:

  • clone a repository (perhaps before or after using GitHub's fork, which we'll get to in a moment) to your local machine (e.g., laptop);
  • create new branch names in that local repository, and use those to keep track of new commits that you add;
  • then run git push to send those new commits to a GitHub repository where you have permission to create or update branch names.

The git push step will succeed provided that:

  • you have permission, and
  • the operation is a fast-forward (merely adds commits) or creates a new branch name.

If you get ! rejected (non-fast-forward), it means that you've chosen a git push operation that would first remove some commit from the existing branch name over on GitHub (then maybe, or maybe not, add new commits too).

Assuming you're using git push to send new commits to a GitHub repository that you own, you can use git push --force if you wish to override a non-fast-forward error. This tells the Git over at GitHub that, yes, you really did intend to drop the old commits in favor of the new ones.

You'll generally need this after a git rebase, because rebase works by copying some old-and-now-bad commits to new-and-improved commits. No existing commit can ever be changed, so if some existing commit has a problem, and you want to fix that problem, you need to toss out the old commit in favor of the new-and-improved one. That means you're telling your Git to discard some old commit—but if you've sent that old commit to GitHub, they won't want to discard it either. Your Git knows that the rebase is a replace old and lousy commits with these new improved ones operation, but their Git just sees toss out some old commits, here's some new ones without the due to a rebase part.

Clones and forks

In some cases—such as when you are an employee of some company—you may have direct access to a GitHub repository. It may have protected branches (you can't push directly to master for instance), but you can create your own branch names in that GitHub repository, and use force-push with those branch names if needed. In this case, the picture stays relatively simple. There are just two repositories you'll worry about:

  • There's the corporate one over on GitHub. You might need to be just a bit more careful with this one, since breaking it gets everyone mad at you. Just be super-careful with force pushes, making sure that you only do this with your branches.

  • And, there's your private clone, on your computer (e.g., a laptop). If you wreck this somehow, you can just re-clone the corporate GitHub repository, so you don't have to be quite so careful (though of course losing work is no fun).

In this setup, when you go to make a PR, you just git push to your own branch, then open the PR. If you need to update your PR, you can either git push or git push --force to your own branch: GitHub automatically updates the PR. The set of commits in the pull request is simply the set of commits in your GitHub branch that are not in the "base branch" (another branch in the same repository).

But you might not have direct access to the GitHub repository. There's an alternative method, using GitHub's "fork" operation. Some companies set things up this way for safety reasons, and many open-source projects use this same technique. In this kind of setup, things get a bit more complicated, because now there are three repositories involved.

It's time to take a small detour, and introduce the difference between a clone and a fork. Let's look first at a regular clone.

A clone copies all the commits and none of the branches

Let's start with:

git clone ssh://git@github.com/path/to/repo.git

You run this command on your laptop, having set up your ssh key access to GitHub. Your Git creates a new, totally-empty repository: it has no commits and no branches. Your Git then adds the name origin as a remote, so that the URL is saved, and connects to their Git at the URL. Their Git lists out all their branch and tag and other names and the corresponding commit hash IDs. Your Git says that you want everything, because you have no commits at all. They package up and send over everything, and your Git creates remote-tracking names for each of their branch names.

The result of all of this is that you now have all the commits, and no branches. Instead of branches, you have remote-tracking names to remember the last commit in each of their branches.

Finally, before returning control to the command line so that you can begin working, your Git creates one branch in the new repository, and does a git checkout of that one name. The branch name that your Git creates comes from your -b option, to your git clone command. If you don't give a -b option, your Git asks their Git, over at origin, which name they recommend, and uses that name. Your Git creates your branch of that name based on the commit found by their branch of that name, which is now in your origin/whatever remote-tracking name.

The final end result is that in your new clone, you have all of their commits and one branch. The one branch you have here is yours, although the name points to the same commit as their branch of your choice. You can now begin adding commits to your branch, or create new branch names.

A fork copies both commits and branch names

When you use GitHub's fork button, GitHub makes a clone, but they don't do it the standard way, that gets remote-tracking names instead of branch names. Since this clone isn't on your computer, they instead copy all the branch names from the original commit. These are now separate names, but your GitHub fork has all the same branch names, pointing to the same commits, as the repository you just forked.

At least, it does now. This is where the problems start. Now that you have your own fork, any changes that update their branch names don't update your fork.

What this means for you is that you should now clone your fork, and, as soon as this git clone process finishes, you should add a remote to your laptop clone. This remote needs a name and a URL. You already have a remote named origin; the URL for this remote is your GitHub fork. But you need two remotes.

The usual second-remote-name is upstream. I'm not a huge fan of this name but don't have my own recommendation, so if you wish to use upstream, you'll run:

git remote add upstream ssh://git@github.com/path/to/original.git

This path/to/original.git part is the URL that you need to give to GitHub to access the repository you forked.

Once you've done that, you will need to run:

git fetch upstream

to obtain any new commits they have that you don't—there probably aren't any, unless they've added new commits since you pushed the fork button—and to create, in your laptop Git repository, remote-tracking names for each of their branches.

Let's say that the upstream has branches named main, feature/short, and feature/tall. Your GitHub fork will have branches named main, feature/short, and feature/tall. Your clone on your laptop will have remote-tracking names: origin/main, origin/feature/short, and origin/feature/tall.

You may or may not want to keep those branch names in your GitHub fork. You may or may not want to keep all those remote-tracking names in your laptop clone. But you probably do want to add, to your laptop clone, remote-tracking names upstream/main, upstream/feature/short, and upstream/feature/tall. That's what your git fetch upstream will do.

Now, as new commits are added to upstream's main or feature/short or whatever, you can run git fetch upstream to get these new commits onto your laptop. You can then run git push origin upstream/main:main to send those new commits to your GitHub fork and update your GitHub fork's main, if you want to do that.

Wait, what's this new git push?

I've just introduced a new git push syntax here, so let's revisit git push:

  • We run git push origin somebranch to send our new commits from our branch somebranch to our GitHub fork and create-or-update the branch name somebranch over on the GitHub fork.

  • This uses what Git calls a refspec. The name somebranch at the end here is short for somebranch:somebranch. The two names, on the left and right side, have two different purposes:

    • The name on the left, somebranch, is for our Git. Our Git looks up the commit hash ID using this name. That's how it knows which commit(s) to send.

    • The name on the right, somebranch, is for the remote (origin, the fork). That's the name we're going to ask (regular push) or command (force-push) them to create-or-update.

What we'll do, now that we have upstream/*, is transfer new commits from the original repository on GitHub to our fork on GitHub. To do that, we bring those new commits into our laptop Git repository, updating upstream/main, upstream/feature/short, and and so on.

Having gotten those new commits, we want to send them to origin, so we can git push origin upstream/whatever:whatever. The name on the left of the colon—upstream/main for instance—locates the commit in our repository that we just got from upstream. The name on the right of the colon is the branch name we want GitHub to update in our fork.

Making a PR with a fork

Now that we have this fork, we use one other special feature of a GitHub fork. With a GitHub fork, we can make a pull request to the original repository we forked. To do that, we:

  • create or update a branch in our GitHub fork (with git push from our laptop);
  • use the pull request button on the GitHub page to make the new PR.

Any time we git push to our GitHub fork, GitHub will automatically update the PR. So, if we need to revise our PR, we don't have to close or delete it first: we just have to push—or maybe force-push—to our GitHub branch. Of course, first we'll need the right set of commits, which often means we need to git fetch upstream and then maybe rebase using upstream/feature/short or upstream/main or whatever as the new base. If this creates a non-fast-forward situation, we will have to use git push --force or equivalent to update our GitHub fork afterward.

torek
  • 448,244
  • 59
  • 642
  • 775