Seeking Clarification About Git Clone and Pull Requests

Question

I'm struggling with understanding how to perform a pull request and I don't use Github often.

I was asked to git clone a repo onto my local machine and make some changes. From there, I'm supposed to perform a pull request back. But I'm confused on how to do so? Is it possible to just pull-request the file thats on my local machine? Or am I required to make my local file its own git repo and then pull-request?

Any help would be much appreciated.

score 1 · Answer 1 · answered Sep 02 '21 at 06:25

1

The basic way to contribute to a github project is to fork the repo first (top right corner). Then you clone your forked repository, make changes there and push them. When you are done you can open a pull request which merges the changes of your forked repository to the origin.

answered Sep 02 '21 at 06:25

Aiqs

61
7

I'm a little confused. From what I recall, to fork is to create another branch, correct? I was able to already git clone the repo and make my edits, so by forking, wouldn't I be making another branch? If that's the case, is there a way to move the edits on my local machine into that forked branch? – Ferr Tamer Sep 02 '21 at 06:36
If you already cloned and edited the repository, you should also fork the branch first. Then you can add a new remote repo to your local repo which points to your forked remote repo with git remote add. After that you should be able to push to your forked branch and proceed like described above – Aiqs Sep 02 '21 at 08:02

score 0 · Answer 2 · edited Sep 02 '21 at 07:00

Short of it: for pull request, you need to have your changes available somewhere, where the other repo can pull them from:

Usually you fork the repository at Github. Clone your fork to your own machine, work on it, commit, push, the normal stuff when developing.
If you can push to the same repository, you can just create your own branch in it, check it out, work on it, push to your branch, the usual dev process.

Then once your code is at Github, you can go to that repository and branch of yours, and click "create pull request", and fill in the required info.

Alain · Accepted Answer · 2021-09-22T03:47:52.487

Pull requests let you tell others about changes you've pushed to a branch in a repository on GitHub - About pull request

Step 1: Git Clone the project to your local machine

For reference: https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository-from-github/cloning-a-repository

Step 2: Create a new branch, commit code changes, and push it to remote Github

For reference: https://docs.github.com/en/get-started/using-git/pushing-commits-to-a-remote-repository

Step 3: Go to web page of your Github project > select tab Pull requests > click button New pull request > choose the branch you created as compare and the target branch you want to merge into as base, that's it!

For reference: https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests

*Step 4: There may be merge conflict warning, then you should fix the merge conflict in your local and push to remote as Step 2.

Step 5: Your project member will review your code change, and once they approve pull request, your code changes be will successfully merged to base branch.

score 0 · Answer 4 · answered Sep 02 '21 at 15:50

It's not clear from your question how much you know about Git. GitHub adds an additional layer of complexity, but Git itself is already pretty complicated.

Let me start with a high-speed review of things you need to know about Git:

Git is a distributed version control system, in which multiple users can have multiple full copies of each repository. The overall unit here is "repository" and we generally use git clone to copy a repository. We then call the various copies "clones" of the repository, although each one is in fact an independent repository.
The unit of storage inside a repository is the commit. A clone made of some repository starts with all the same commits, and two clones that were in sync at some point, can be re-synchronized by connecting them to each other. One copy is designated as the sender and one as the receiver; the sender sends any commits that the receiver needs, but lacks, and then the sender has the receiver set some name(s) to remember those commits. This transfer is always one-way: the receiver never turns around and starts sending, for instance. (Hence, a full re-sync may require two separate operations.)
Each commit is numbered, with a very large (currently 160-bit), unique number, expressed in hexadecimal. These are called hash IDs and they appear random, but in fact they are cryptographic checksums of the contents of the internal objects. Commit objects have additional supporting objects, which are included automatically as needed in the synchronization transfer; like commits, they have hash IDs. (Tag objects are just outside this setup, but we'll simply ignore them for simplification. Note that tree and blob object hash IDs are unique to their object, but they can be re-used in many commits; it's only the commits themselves that are truly unique.)
Commits store two things. Each commit has a full snapshot of every source file, plus metadata, or information about the commit itself. The metadata include the raw hash ID of some earlier (parent) commit or commits as needed, so that the commits themselves form a backwards-looking chain. This is the history in a Git repository: there is no file history, there are merely commits that are the history.
A branch name (or indeed any other name, though branch names are slightly special) in any Git repository simply holds one hash ID. Branch names in particular are constrained to hold only commit hash IDs. So, a branch name identifies the last commit in the branch. From there, Git works backwards, using the hash IDs saved in each commit, to work backwards, one hop at a time, to the first commit. Adding a commit to a branch is therefore just a matter of making a new commit object whose parent hash ID is the hash ID of the commit that is, currently, the last one in the branch, then updating the branch name to store the hash ID of this new last-commit.
A repository is therefore best viewed as consisting of two databases:
- The object database holds all the objects. Transfers—git push or git fetch—between two repositories consist of picking out objects that are missing from one of the two repositories, and sending those over. In this way, the objects are shared. Because the hash ID of an object is a cryptographic checksum of the object's content, and the hash ID is therefore unique to that object, the two Gits can simply exchange hash IDs to figure out what to send.
- The names database holds the name-to-hash-ID table. This table is not shared; each repository has its own independent branch names, tag names, and so on.
The process of sending commits (and other objects) from one repository to another ends with the receiving repository updating some name(s) so as to remember the last commit of each branch. Since Git finds commits by using the names, if the receiving repository does not update any names, the receiver cannot find the commits.

This last bit gets us to the complexity behind git fetch ("get commits from them") vs git push ("send commits to them"). As a general rule—there are some specific exceptions, especially with a GitHub "fork" operation—when we fetch from some other Git repository, we tell our Git not to take their branch names as-is. If we've made a new commit on our branch named feature, and they've made a different new commit on their branch named feature, there would be a problem:

          H   <-- our "feature"  (commit H is our new commit)
         /
...--F--G   ["feature" used to name this commit when we both started]
         \
          I   <-- their "feature" (commit I is their new commit)

A single name, feature, can select only one commit. If it selects commit F or G in both repositories, that's fine. But we've now added our new commit H, and they've added their new commit I. (These single uppercase letters stand in for real hash IDs.) The one name feature can select H, from which we go back to G, then F, and so on; or it can select I, from which we go back to G, then F, then so on. It cannot select both.

So, what we do when we run git fetch origin is tell our Git: Don't take their branch names as-is. Change them. Turn their feature into our origin/feature, because we're calling this other Git origin. (The name origin is a remote and is what we used when we set up our clone originally.) We have our Git create or update our origin/feature, leaving our branch name feature alone, so that we get:

          H   <-- feature
         /
...--F--G
         \
          I   <-- origin/feature

By using two different names, we allow our Git to remember two different commit hash IDs. If they add more commits, or even remove commit I from their repository, that's no problem:

          H   <-- feature
         /
...--F--G
         \
          I--J   <-- origin/feature

or even:

          H   <-- feature
         /
...--F--G   <-- origin/feature
         \
          I   [abandoned]

An abandoned commit like this still exists in our repository, it's just become hard to find. (Eventually, if it stays abandoned long enough, git gc removes it for real.)

The git push command, however, doesn't work like this. When we run git push origin, we have our Git send our commits to their Git, which stores them in their repository (technically in a sort of quarantine area initially). Then we ask their Git to set their branch name. So we start with:

          H--K   <-- feature
         /
...--F--G   <-- origin/feature

and send commits H and K, and then ask them to set their branch name feature to point to commit K. As long as their branch name feature—which we see here reflected as our origin/feature—still points to commit G, it's "safe" for them to do this, to add on commits H and K. But if their feature points to some commit I or J, it's not safe, and they will reject our request. (The fact that our commits H and K went into a quarantine location then makes it easy for them to eject them from their objects database: important at places like GitHub that receive a lot of data, then reject some of it, e.g., for having overly large files.)

Anyway, if all goes well and they accept our push, our Git will now update our origin/feature, since we know they moved their feature to point to K:

...--G--H--K   <-- feature, origin/feature

and now all is well: the two repositories are in sync.

Sometimes GitHub adds no complexity

Let's suppose now that the GitHub repository to which you want to make a pull request is:

ssh://git@github.com/user/repo.git

(You can use https://github.com with a Personal Authentication Token instead, if you prefer, but I'll use ssh for the examples here.)

Now, let's also suppose that you have push access to this repository, so that you can create a new branch name in this repository.

To make your pull request on GitHub, you will:

git clone ssh://git@github.com/user/repo.git

This will copy, to your own local machine (let's call this "laptop"), all the commits from repo.git, and none of the branches: instead, your Git will add the name origin, referring to ssh://git@github.com/user/repo.git, and rename all their branch names to your origin/* remote-tracking names.

Then, because you did not say -b main or -b develop or whatever, your Git will ask their Git which branch name they recommend. They'll say main or whatever it is they say. Your Git will now create, in your clone, one branch named main (or whatever), pointing to the same commit as your origin/main (or origin/whatever).

Last, your Git will check out this one particular commit, so that you can work on it. I will assume that you know everything you need to know about working on a commit, locally, and about creating new branch names, locally, and so on.

Eventually, you will have one or more new commits in your repository. You now need to transfer these commits to the repository over on GitHub that you're calling origin. To do so, you will need to use git push.

As we saw above, your git push will negotiate with their Git—their software talking to their repository, over on GitHub—to figure out which commits to send, and will then send them, along with any additional objects required. Then your Git will ask them to set a branch name in their repository. You can choose the branch name in their repository in several ways:

If you do nothing special, just run:
```
git push -u origin HEAD
```
or similar, your Git will ask their Git to set a branch of the same name that you have in your repository. That's easy and convenient: you just need to choose your (local) branch name carefully up front.
Or, you can run:
```
git push -u origin HEAD:newbranch
```
or similar. This will have your Git ask their Git to set the branch name newbranch. You can use any valid branch name here, such as hi/there/new/branch or whatever, but generally, you need to keep it simple.

If they accept this operation, you're nearly done: you now need to use GitHub itself—not Git, which has no idea what a GitHub "pull request" is—to create the pull request on GitHub. This usually involves using their web interface: you navigate to github.com/user/repo.git and press various clicky buttons to create a pull request, choosing your branch (just created) as the source and some other branch as the "base branch". If all goes well, this creates the PR on GitHub and alerts the administrators of github.com/user/repo.git that there is a new pull request.

(GitHub also have a gh script that you can run, that uses curl to do all this from the command line. I have not yet used this myself.)

Sometimes GitHub adds complexity

Our predicate above, required to make this all simple, was that you have direct push access to ssh://github.com/user/repo.git (or the https variant). What if you don't?

In this case, GitHub offer an easy path forward. You start with GitHub's FORK button (or the equivalent gh command, which does this and then does a git clone to your laptop, all at once). This GitHub side operation is, at its heart, a git clone, but with one big difference: A GitHub fork clone copies all the branches too, and leaves a link back to the original.

That is, when we run:

git clone -b somebranch ssh://git@github.com/user/repo.git

we get a (local) Git repository on our laptops, in which all the commits have been copied, but no branch has been copied; instead, one new branch has been created based on our -b argument, or lack thereof. "Their" repository, over on GitHub, has no idea what we made this clone. There is a weak link from our clone to their repository, in that our Git stored ssh://git@github.com/user/repo.git under the name origin, so that we can later use origin to refer to it.

But the GitHub fork-clone makes a clone in which all branch names are copied, and there's a much stronger link between their GitHub repository, and our new GitHub fork, going both ways. They can see that we forked their GitHub repository, and our fork links to their GitHub repository. This link is invisible in the repository itself: GitHub stores this linkage information elsewhere.

Now that we have our fork, at ssh://github.com/us/repo.git for instance, the way we work with it is to clone our fork to our laptop:

git clone -b somebranch ssh://git@github.com/us/repo.git

This stores ssh://git@github.com/us/repo.git in our clone under the name origin.

For various purposes, we'll eventually want to remember ssh://git@github.com/user/repo.git in our laptop clone, under another name. By convention, this second name is upstream. I think this is a poor name, but don't have a better suggestion, and in some ways it's better to follow the herd here, so I'll use upstream too. We enter our clone:

cd repo

and use git remote add to add this second remote name:

git remote add upstream ssh://git@github.com/user/repo.git

We can now run:

git fetch upstream

to get our Git to call up ssh://git@github.com/user/repo.git—the URL stored under the name upstream—and obtain any commits they have that we don't, and then create-or-update all of our upstream/* remote-tracking names.

If we do these two steps fast enough, there won't be any new commits: we'll have gotten all the commits when we ran git clone ssh://git@github.com/us/repo.git, which has all the commits (and all the branches). So all this will really do is create upstream/*, all of which will match origin/*. Which might leave you wondering: why did we bother?

The answer is: over time, they will add new commits to their upstream (github.com/user/repo.git) repository, which won't appear in our fork (github.com/us/repo.git). We will need to transfer these to our laptop, with git fetch upstream, and then send them back to github, with git push origin`.¹

(That's all for later, unless repo.git is really active and it took so long for us to fork-and-clone that we have to do it now. But keep it in mind.)

Now that we have our clone over on GitHub—in github.com/us/repo.git—and our clone on our laptop, we proceed as usual: we make new commits, test them out, etc., making branches as we go, and eventually arrive at some new commits we'd like to put into a GitHub Pull Request. To do that, we:

send our new commits to our fork, under some new branch name: this works just like the simpler case; then
use GitHub's "pull request" clicky buttons, or the gh command line, to make a new pull request that goes from our GitHub fork, to their original on GitHub.

In short—if it's not too late —we made a GitHub fork just so that we would have a GitHub repository to which we can git push commits and set up a branch name. We made this fork of their GitHub repository because we're not allowed to set branch names in their GitHub repository. The commits are shared; the branch names are not.

¹This creates an obvious large inefficiency: wouldn't it be better to just have GitHub itself do this on its own? The answer to that is yes but. The but part has to do with when and how which branch names on our GitHub fork would get updated. If GitHub used remote-tracking names, this would be less of a problem, but they don't.

After they accept your PR

Once they accept your pull request, you'll probably want to update your laptop clone. One complication here occurs if you had to make a GitHub fork: now you need to git fetch upstream, rather than just git fetch origin, and then you will probably want to git push origin to update your GitHub fork. See the section above.

There's more though. If and when they do accept your PR, they can use one of three green clicky buttons:

MERGE just does an ordinary Git merge. All is good.
REBASE AND MERGE has GitHub copy all your commits to new, different commits. This is a pain in the butt because now your commits, that exist on your laptop and maybe in your GitHub fork, are to be obsoleted in favor of their new and supposedly-improved commits. It's your choice as to whether to go along with all this, but if you want to play nice with them, you will be forced to do so.

There's no easy and convenient way to update your laptop and your GitHub fork. Instead, you have to use less-convenient methods (which we won't cover here). For most simple cases, this is just a matter of discarding your branch name, then starting over with theirs, though.
SQUASH AND MERGE turns all of your commits into one big commit that they own, in their repository. This is similar to the rebase-and-merge button, in that you now have to discard your commits in favor of their new single commit. It's worse in that unless your commit was one commit to start with, it's much harder to automate. Again, though, it's usually just a matter of throwing away your branch name. In fact, if you ever use git merge --squash locally, that's the same procedure you'll need locally: squash merge means "discard the others", at a higher level than individual commits.²

They might, of course, not accept your PR to start with, in which case you may have to do your own replacing of commits with new-and-improved commits. Whether you are doing this through your own GitHub fork, or through direct pushes to a GitHub repository where you have push privileges, you'll typically want to use git push --force or git push --force-with-lease³ to update your GitHub branch, after you replace, in your laptop, some commits with other new-and-improved commits.

This process is remarkably similar to the one you'll use if they used the rebase-and-merge or squash-and-merge sequence: both of them involve throwing out the old branch (and its commits) in favor of a new branch (and its commits). The difference between deleting the branch and then creating one of the same name, and force-pushing, is ... basically nonexistent.

²It is in theory possible to set up a squash-merge sequence that doesn't involve throwing out the squashed commits. In practice, it's too hard; don't do it.

³The --force-with-lease option is --force with a safety inspection first. See other StackOverflow questions and answers for more.

Seeking Clarification About Git Clone and Pull Requests

4 Answers4

Sometimes GitHub adds no complexity

Sometimes GitHub adds complexity

After they accept your PR