It's not clear from your question how much you know about Git. GitHub adds an additional layer of complexity, but Git itself is already pretty complicated.
Let me start with a high-speed review of things you need to know about Git:
Git is a distributed version control system, in which multiple users can have multiple full copies of each repository. The overall unit here is "repository" and we generally use git clone
to copy a repository. We then call the various copies "clones" of the repository, although each one is in fact an independent repository.
The unit of storage inside a repository is the commit. A clone made of some repository starts with all the same commits, and two clones that were in sync at some point, can be re-synchronized by connecting them to each other. One copy is designated as the sender and one as the receiver; the sender sends any commits that the receiver needs, but lacks, and then the sender has the receiver set some name(s) to remember those commits. This transfer is always one-way: the receiver never turns around and starts sending, for instance. (Hence, a full re-sync may require two separate operations.)
Each commit is numbered, with a very large (currently 160-bit), unique number, expressed in hexadecimal. These are called hash IDs and they appear random, but in fact they are cryptographic checksums of the contents of the internal objects. Commit objects have additional supporting objects, which are included automatically as needed in the synchronization transfer; like commits, they have hash IDs. (Tag objects are just outside this setup, but we'll simply ignore them for simplification. Note that tree and blob object hash IDs are unique to their object, but they can be re-used in many commits; it's only the commits themselves that are truly unique.)
Commits store two things. Each commit has a full snapshot of every source file, plus metadata, or information about the commit itself. The metadata include the raw hash ID of some earlier (parent) commit or commits as needed, so that the commits themselves form a backwards-looking chain. This is the history in a Git repository: there is no file history, there are merely commits that are the history.
A branch name (or indeed any other name, though branch names are slightly special) in any Git repository simply holds one hash ID. Branch names in particular are constrained to hold only commit hash IDs. So, a branch name identifies the last commit in the branch. From there, Git works backwards, using the hash IDs saved in each commit, to work backwards, one hop at a time, to the first commit. Adding a commit to a branch is therefore just a matter of making a new commit object whose parent hash ID is the hash ID of the commit that is, currently, the last one in the branch, then updating the branch name to store the hash ID of this new last-commit.
A repository is therefore best viewed as consisting of two databases:
The object database holds all the objects. Transfers—git push
or git fetch
—between two repositories consist of picking out objects that are missing from one of the two repositories, and sending those over. In this way, the objects are shared. Because the hash ID of an object is a cryptographic checksum of the object's content, and the hash ID is therefore unique to that object, the two Gits can simply exchange hash IDs to figure out what to send.
The names database holds the name-to-hash-ID table. This table is not shared; each repository has its own independent branch names, tag names, and so on.
The process of sending commits (and other objects) from one repository to another ends with the receiving repository updating some name(s) so as to remember the last commit of each branch. Since Git finds commits by using the names, if the receiving repository does not update any names, the receiver cannot find the commits.
This last bit gets us to the complexity behind git fetch
("get commits from them") vs git push
("send commits to them"). As a general rule—there are some specific exceptions, especially with a GitHub "fork" operation—when we fetch from some other Git repository, we tell our Git not to take their branch names as-is. If we've made a new commit on our branch named feature
, and they've made a different new commit on their branch named feature
, there would be a problem:
H <-- our "feature" (commit H is our new commit)
/
...--F--G ["feature" used to name this commit when we both started]
\
I <-- their "feature" (commit I is their new commit)
A single name, feature
, can select only one commit. If it selects commit F
or G
in both repositories, that's fine. But we've now added our new commit H
, and they've added their new commit I
. (These single uppercase letters stand in for real hash IDs.) The one name feature
can select H
, from which we go back to G
, then F
, and so on; or it can select I
, from which we go back to G
, then F
, then so on. It cannot select both.
So, what we do when we run git fetch origin
is tell our Git: Don't take their branch names as-is. Change them. Turn their feature
into our origin/feature
, because we're calling this other Git origin
. (The name origin
is a remote and is what we used when we set up our clone originally.) We have our Git create or update our origin/feature
, leaving our branch name feature
alone, so that we get:
H <-- feature
/
...--F--G
\
I <-- origin/feature
By using two different names, we allow our Git to remember two different commit hash IDs. If they add more commits, or even remove commit I
from their repository, that's no problem:
H <-- feature
/
...--F--G
\
I--J <-- origin/feature
or even:
H <-- feature
/
...--F--G <-- origin/feature
\
I [abandoned]
An abandoned commit like this still exists in our repository, it's just become hard to find. (Eventually, if it stays abandoned long enough, git gc
removes it for real.)
The git push
command, however, doesn't work like this. When we run git push origin
, we have our Git send our commits to their Git, which stores them in their repository (technically in a sort of quarantine area initially). Then we ask their Git to set their branch name. So we start with:
H--K <-- feature
/
...--F--G <-- origin/feature
and send commits H
and K
, and then ask them to set their branch name feature
to point to commit K
. As long as their branch name feature
—which we see here reflected as our origin/feature
—still points to commit G
, it's "safe" for them to do this, to add on commits H
and K
. But if their feature
points to some commit I
or J
, it's not safe, and they will reject our request. (The fact that our commits H
and K
went into a quarantine location then makes it easy for them to eject them from their objects database: important at places like GitHub that receive a lot of data, then reject some of it, e.g., for having overly large files.)
Anyway, if all goes well and they accept our push, our Git will now update our origin/feature
, since we know they moved their feature
to point to K
:
...--G--H--K <-- feature, origin/feature
and now all is well: the two repositories are in sync.
Sometimes GitHub adds no complexity
Let's suppose now that the GitHub repository to which you want to make a pull request is:
ssh://git@github.com/user/repo.git
(You can use https://github.com
with a Personal Authentication Token instead, if you prefer, but I'll use ssh for the examples here.)
Now, let's also suppose that you have push access to this repository, so that you can create a new branch name in this repository.
To make your pull request on GitHub, you will:
git clone ssh://git@github.com/user/repo.git
This will copy, to your own local machine (let's call this "laptop"), all the commits from repo.git
, and none of the branches: instead, your Git will add the name origin
, referring to ssh://git@github.com/user/repo.git
, and rename all their branch names to your origin/*
remote-tracking names.
Then, because you did not say -b main
or -b develop
or whatever, your Git will ask their Git which branch name they recommend. They'll say main
or whatever it is they say. Your Git will now create, in your clone, one branch named main
(or whatever), pointing to the same commit as your origin/main
(or origin/whatever).
Last, your Git will check out this one particular commit, so that you can work on it. I will assume that you know everything you need to know about working on a commit, locally, and about creating new branch names, locally, and so on.
Eventually, you will have one or more new commits in your repository. You now need to transfer these commits to the repository over on GitHub that you're calling origin
. To do so, you will need to use git push
.
As we saw above, your git push
will negotiate with their Git—their software talking to their repository, over on GitHub—to figure out which commits to send, and will then send them, along with any additional objects required. Then your Git will ask them to set a branch name in their repository. You can choose the branch name in their repository in several ways:
If you do nothing special, just run:
git push -u origin HEAD
or similar, your Git will ask their Git to set a branch of the same name that you have in your repository. That's easy and convenient: you just need to choose your (local) branch name carefully up front.
Or, you can run:
git push -u origin HEAD:newbranch
or similar. This will have your Git ask their Git to set the branch name newbranch
. You can use any valid branch name here, such as hi/there/new/branch
or whatever, but generally, you need to keep it simple.
If they accept this operation, you're nearly done: you now need to use GitHub itself—not Git, which has no idea what a GitHub "pull request" is—to create the pull request on GitHub. This usually involves using their web interface: you navigate to github.com/user/repo.git
and press various clicky buttons to create a pull request, choosing your branch (just created) as the source and some other branch as the "base branch". If all goes well, this creates the PR on GitHub and alerts the administrators of github.com/user/repo.git
that there is a new pull request.
(GitHub also have a gh
script that you can run, that uses curl
to do all this from the command line. I have not yet used this myself.)
Sometimes GitHub adds complexity
Our predicate above, required to make this all simple, was that you have direct push access to ssh://github.com/user/repo.git
(or the https
variant). What if you don't?
In this case, GitHub offer an easy path forward. You start with GitHub's FORK button (or the equivalent gh
command, which does this and then does a git clone
to your laptop, all at once). This GitHub side operation is, at its heart, a git clone
, but with one big difference: A GitHub fork clone copies all the branches too, and leaves a link back to the original.
That is, when we run:
git clone -b somebranch ssh://git@github.com/user/repo.git
we get a (local) Git repository on our laptops, in which all the commits have been copied, but no branch has been copied; instead, one new branch has been created based on our -b
argument, or lack thereof. "Their" repository, over on GitHub, has no idea what we made this clone. There is a weak link from our clone to their repository, in that our Git stored ssh://git@github.com/user/repo.git
under the name origin
, so that we can later use origin
to refer to it.
But the GitHub fork-clone makes a clone in which all branch names are copied, and there's a much stronger link between their GitHub repository, and our new GitHub fork, going both ways. They can see that we forked their GitHub repository, and our fork links to their GitHub repository. This link is invisible in the repository itself: GitHub stores this linkage information elsewhere.
Now that we have our fork, at ssh://github.com/us/repo.git
for instance, the way we work with it is to clone our fork to our laptop:
git clone -b somebranch ssh://git@github.com/us/repo.git
This stores ssh://git@github.com/us/repo.git
in our clone under the name origin
.
For various purposes, we'll eventually want to remember ssh://git@github.com/user/repo.git
in our laptop clone, under another name. By convention, this second name is upstream
. I think this is a poor name, but don't have a better suggestion, and in some ways it's better to follow the herd here, so I'll use upstream
too. We enter our clone:
cd repo
and use git remote add
to add this second remote name:
git remote add upstream ssh://git@github.com/user/repo.git
We can now run:
git fetch upstream
to get our Git to call up ssh://git@github.com/user/repo.git
—the URL stored under the name upstream
—and obtain any commits they have that we don't, and then create-or-update all of our upstream/*
remote-tracking names.
If we do these two steps fast enough, there won't be any new commits: we'll have gotten all the commits when we ran git clone ssh://git@github.com/us/repo.git
, which has all the commits (and all the branches). So all this will really do is create upstream/*
, all of which will match origin/*
. Which might leave you wondering: why did we bother?
The answer is: over time, they will add new commits to their upstream
(github.com/user/repo.git) repository, which won't appear in our fork (github.com/us/repo.git). We will need to transfer these to our laptop, with
git fetch upstream, and then send them back to github, with
git push origin`.1
(That's all for later, unless repo.git
is really active and it took so long for us to fork-and-clone that we have to do it now. But keep it in mind.)
Now that we have our clone over on GitHub—in github.com/us/repo.git
—and our clone on our laptop, we proceed as usual: we make new commits, test them out, etc., making branches as we go, and eventually arrive at some new commits we'd like to put into a GitHub Pull Request. To do that, we:
- send our new commits to our fork, under some new branch name: this works just like the simpler case; then
- use GitHub's "pull request" clicky buttons, or the
gh
command line, to make a new pull request that goes from our GitHub fork, to their original on GitHub.
In short—if it's not too late —we made a GitHub fork just so that we would have a GitHub repository to which we can git push
commits and set up a branch name. We made this fork of their GitHub repository because we're not allowed to set branch names in their GitHub repository. The commits are shared; the branch names are not.
1This creates an obvious large inefficiency: wouldn't it be better to just have GitHub itself do this on its own? The answer to that is yes but. The but part has to do with when and how which branch names on our GitHub fork would get updated. If GitHub used remote-tracking names, this would be less of a problem, but they don't.
After they accept your PR
Once they accept your pull request, you'll probably want to update your laptop clone. One complication here occurs if you had to make a GitHub fork: now you need to git fetch upstream
, rather than just git fetch origin
, and then you will probably want to git push origin
to update your GitHub fork. See the section above.
There's more though. If and when they do accept your PR, they can use one of three green clicky buttons:
MERGE just does an ordinary Git merge. All is good.
REBASE AND MERGE has GitHub copy all your commits to new, different commits. This is a pain in the butt because now your commits, that exist on your laptop and maybe in your GitHub fork, are to be obsoleted in favor of their new and supposedly-improved commits. It's your choice as to whether to go along with all this, but if you want to play nice with them, you will be forced to do so.
There's no easy and convenient way to update your laptop and your GitHub fork. Instead, you have to use less-convenient methods (which we won't cover here). For most simple cases, this is just a matter of discarding your branch name, then starting over with theirs, though.
SQUASH AND MERGE turns all of your commits into one big commit that they own, in their repository. This is similar to the rebase-and-merge button, in that you now have to discard your commits in favor of their new single commit. It's worse in that unless your commit was one commit to start with, it's much harder to automate. Again, though, it's usually just a matter of throwing away your branch name. In fact, if you ever use git merge --squash
locally, that's the same procedure you'll need locally: squash merge means "discard the others", at a higher level than individual commits.2
They might, of course, not accept your PR to start with, in which case you may have to do your own replacing of commits with new-and-improved commits. Whether you are doing this through your own GitHub fork, or through direct pushes to a GitHub repository where you have push privileges, you'll typically want to use git push --force
or git push --force-with-lease
3 to update your GitHub branch, after you replace, in your laptop, some commits with other new-and-improved commits.
This process is remarkably similar to the one you'll use if they used the rebase-and-merge or squash-and-merge sequence: both of them involve throwing out the old branch (and its commits) in favor of a new branch (and its commits). The difference between deleting the branch and then creating one of the same name, and force-pushing, is ... basically nonexistent.
2It is in theory possible to set up a squash-merge sequence that doesn't involve throwing out the squashed commits. In practice, it's too hard; don't do it.
3The --force-with-lease
option is --force
with a safety inspection first. See other StackOverflow questions and answers for more.