Beginner Git problem regarding pull requests

Question

My every pull request is visible as an commit to all the previous pull requests and this boggles my mind and as I want each PR to have only its commit visible See the image.

In this image, "testbranch2", "testbranch3" and "update1" are all different branches. I don't want their commits to be visible in the "testbranch3" pull request...

way I make commits and pushes

git add .
git commit -m "update2"
git push --set-upstream origin "branchName"

//then I create new branch before I start working
git checkout -b "branchName2"

What am I doing wrong ?

If testbranch3 contains testbranch2 and the branch you're targeting with your pull request doesn't contain the commits from testbranch2, then you will see the commits from testbranch2. I'm not sure if that's what you're trying to ask about or not though, you need to explain the situation better. — mason, Sep 25 '22 at 21:10
Sorry if I cant articulate in appropriate manner. When I create new branch, why all previous commits (that are not related to that branch and are part of the previous branch) are visible as a commits inside that branch ? — krackgen, Sep 25 '22 at 21:14
Because your new branch is based on the old branch. Therefore it includes the commits of the old branch. That's just how it works. This normally isn't a problem - what specific problem do you think this poses for you? Remember to use @mason in your response to notify me of your reply - the owner of the post is the only person notified by default. — mason, Sep 25 '22 at 21:33
@mason do i "git reset --soft HEAD~1" so that commits from the previous branch won't be visible ? whoever checks my code has a problem seeing all of the previous commits from the previous branches. I just want to every branch have only the commits related to that branch and not the previous branches. — krackgen, Sep 26 '22 at 10:25

score 1 · Accepted Answer · answered Sep 26 '22 at 13:16

What am I doing wrong ?

Believing in branches.

Seriously, the entire problem has to do with what you think Git does, vs what Git actually does. In Git, branch names don't really matter. That is, they matter to you, the human, but not to Git. What matters to Git is the commits. You need to view Git as being about commits—not branches, not files, but commits.

Okay, but: what does that mean, in practical terms? It's Monday morning (at least it is here right now) so what will you do differently that will make Git work for you? The answer starts with running git log --graph --decorate or git log --all --decorate --oneline --graph. See also Pretty Git branch graphs (for many, many more options for viewing the graph). This kind of graph, which shows not only the individual commits themselves, but also the relationships between commits, is how you will know what will show up in Pull Requests later.

Keep in mind that this merely helps you visualize what you already have. You will also have to do something else different, so that you build something else different, so that what you see changes. But if you can't see it—if you can't preview what you have, before you submit it off to GItHub to make a Pull Request, you're just fumbling around in the dark. Once you can see what you're doing, then it's time to change what you're doing.

Unfortunately, branch is kind of a bad word in Git. Not bad in the sense of profanity, but in the sense that it has so many meanings that it doesn't mean anything. Branches do exist, but when someone says "branch", what do they mean? (See also What exactly do we mean by "branch"?) To get there properly, we have to start with the individual commits.

The commit

A Git commit:

Is numbered. It has a unique number, very large and random-looking, such as dda7228a83e2e9ff584bf6adbf55910565b41e14. This big ugly number, which is too hard for humans to deal with (I used cut-and-paste to get this one in), is how Git actually finds a commit. Git literally needs the number in order to find the commit.
Is read-only. This is required for the rather magical numbering system. The numbering system is magical in that every Git system in the universe agrees that that commit gets that number, even without ever talking to another piece of Git software. When you make a new commit, it gets a new, unique number, never used before, never to be used again. This is mathematically impossible and someday Git will break, but the sheer size of the number space puts the day off long enough that we can hope never to see it (though see also How does the newly found SHA-1 collision affect Git?); but to make it work until then, commits have to be read-only. So they are.
Stores two things:
- Directly, each commit stores some metadata, or information about the commit itself. This includes things like the name and email address of the author of the commit, and some date-and-time stamps.
- Indirectly, a full snapshot of every file. This snapshot is in a special, compressed format, read-only like all parts of any commit; but importantly for Git itself, each file's content is de-duplicated. So when the full snapshot in the next commit is mostly a duplicate of the full snapshot in this commit, the next commit doesn't take much space because all the duplicated files take no space. Likewise, this commit didn't take much space because it mostly contains duplicates of files from earlier (or later) commits.

You don't normally have to care very much about these details, as long as you remember that each commit holds metadata and a full snapshot. But there are a few parts of the metadata that you do need to care about. You provide the log message, for instance, and it should be something that you and others can read later and figure out why you made this commit. (You don't need to worry about changes in the commit, as Git can compare adjacent snapshots to find the changes. What you need to explain is the reason for the changes.)

Now, for Git's own operation, there's one other very important thing in each commit's metadata. Every commit is numbered, and each commit contains, in its metadata, the commit numbers—the hash IDs—of some set of earlier commits. Most commits contain just one earlier-commit hash ID. We call such a commit an ordinary commit, and the stored hash ID is the parent of the commit.

This gives ordinary commits a simple parent/child relationship.

Understanding how commits form branches

Suppose we've made a new commit, and it has some big ugly-looking hash ID that we'll just call H for Hash, for short. Let's draw it, floating in space, with an arrow sticking out of it:

<-H

The arrow sticking out of H represents the hash ID stored in H's metadata. This stored parent hash ID effectively points to H's parent commit. Let's call that commit G, and draw it in:

        <-G <-H

Assuming G is also an ordinary commit, it has an arrow sticking out of it like this, representing the hash ID stored in G that points to G's parent F:

... <-F <-G <-H

This repeats all the way down the line to the very first commit ever. What's special about that first commit is that, being first, it doesn't point backwards. That lets Git stop going backwards.

There are two things to note about this:

First, Git needs the hash ID to find the commit, but commits have parent/child relationships where the children point backwards to the parents, so Git can find the hash IDs on its own.
Second, this works for every commit except the very last one. There's no commit "one past the last" to find the last commit.

To fix this problem, we could just memorize hash IDs, but that's an unpleasant prospect. Who wants to remember dda7228a83... or whatever? But, hey, wait a minute, we have a computer. Computers are good at this sort of memorization. Why don't we pick a name, like main or master, and use that name to store a hash ID?

And that's exactly what we do:

...--F--G--H   <-- main

We call that a branch. Or, we call the name main a branch. Or, we call something else a branch. We call a lot of things "a branch", to the point where someone says "a branch" or "the branch" and we have no idea what they actually mean without a lot more context or information.

Branch names find commits

Now, suppose we have an existing chain of commits—what we might call "a branch"—ending at commit H like this:

...--G--H   <-- main

Let's create a new branch name, like develop. We must pick a commit. Which one should we pick? How about H, the latest?

...--G--H   <-- main, develop

Here's a trick question: Which branch are these commits on?

Let me insert a picture of some branches to help hide the answer, while you think about it.

Kameldornbaum Sossusvlei

If branches were real things, this would not be a trick question, but branches in Git aren't very real after all, and it is. The trick answer is that all the commits are on both branches. Before we made develop, all the commits were on only one branch. If we make a third branch name now, all the commits are on three branches.

A branch name, in Git, merely locates the last commit on the branch. In fact, the definition of the branch name is that the commit it points to is the last commit on the branch! So if we decide to make a name pointing to commit G:

...--G   <-- foobranch
      \
       H   <-- main, develop

then all of a sudden, every commit up to and including G is on all three branches and commits up through and including H are on two branches. Commit G is the last commit on branch foobranch.

Remember that all of these commits are entirely read only. Nothing about the commits themselves changes as we create and destroy branch names. The name simply serves to point to one particular commit. By pointing to that commit, it lets Git (and us) find that particular commit quickly—and Git can then work backwards, but not forwards, from that point.

That's it: that's the main purpose of a branch name. It finds one commit, and the one commit it finds is defined as the tip commit of the branch. But to make this work well, there's one other special feature.

Branch names move

Let's draw our repository like this again:

...--G--H   <-- main

and add a new branch name:

...--G--H   <-- main, develop

and then make a new commit. The new commit is going to get some random-looking hash ID; we'll just call it I. New commit I is going to point backwards to existing commit H, for a reason I'll explain in just a moment, and we're going to have:

...--G--H
         \
          I

One of our two branch names should update to point to I. But which one?

This is what being on a branch is about. If git status says on branch main, we'll get:

...--G--H   <-- develop
         \
          I   <-- main

That's probably not what we want. We probably want to be "on" branch develop so that we get:

...--G--H   <-- main
         \
          I   <-- develop

To help show which branch name we're "on"—as in, git status will say that branch's name—let's attach the special name HEAD to exactly one branch name, like this:

...--G--H   <-- main (HEAD), develop

Whoops! We're "on" the wrong branch. We run:

git switch develop      # or git checkout develop

and we get:

...--G--H   <-- main, develop (HEAD)

Nothing else changes except the attachment of HEAD because both names select commit H right now. But then we make our changes and git add and git commit, and we get:

...--G--H   <-- main
         \
          I   <-- develop (HEAD)

which is what we want. Note how HEAD is still "attached to" the name develop.

If we now git switch to main, Git will do more this time. In particular Git will remove the commit-I versions of files, and switch back to the commit-H versions of files.

Git can do this safely because the committed files are stored (indirectly) in the commit, safely read-only and saved for as long as the commit exists (typically "forever" although there are ways to "lose" commits). If Git can't do it safely—which happens sometimes if you start work and then realize you were on the wrong branch—the git switch command will tell you that, and refuse to switch branches. (We won't worry about how to fix up that problem here, but it is possible to fix up.)

(We also haven't covered the complexities of how Git actually makes the snapshots here. It's surprising, at least to those who are used to other version control systems.)

Back to your original problem

git add .
git commit -m "update2"
git push --set-upstream origin "branchName"

This part is OK, although "update2" is a terrible commit message as it says nothing about why you changed whatever it is you changed. We'll skip over all the complexities of git push as well for now, for space reasons.

We should, however, draw a picture of the commit(s) you created:

...--G--H   <-- main
         \
          I   <-- branchName (HEAD)

You are currently "on" commit I via branch name branchName, to which HEAD is attached.

//then I create new branch before I start working
git checkout -b "branchName2"

Aside: consider moving from git checkout to git switch. The two commands do exactly the same things for simple cases (except that you need -c instead of -b here). For complicated cases, git switch is safer because it doesn't bundle all of the git restore code into the same command.¹

The -b option tells git checkout (or -c tells git switch) to create a new branch name at this time. The new branch name will point to the current commit, which is your new commit I. Last, the command switches to the new branch name, so the result looks like this:

...--G--H   <-- main
         \
          I   <-- branchName, branchName2 (HEAD)

If you now make another new commit, the result looks like this:

...--G--H   <-- main
         \
          I   <-- branchName
           \
            J   <-- branchName2 (HEAD)

This is not what you want. You want a result that looks more like this:

          J   <-- branchName2 (HEAD)
         /
...--G--H   <-- main
         \
          I   <-- branchName

That is, you will want commit J to be independent of commit I, so that it does not include any of the work you did in I and so that it connects back to commit H as its parent, rather than to commit I as its parent.

That's because later, when you raise the Pull Request, GitHub (not Git itself, but GitHub) will use commit H as the "base" commit. GitHub (not Git) has the notion of a "base branch" for Pull Requests. Note that Pull Requests are GitHub-specific items: they're not part of Git. They use Git's commits, and they use branch names, but they assign more meaning to those branch names than Git ever did.

Because that's what you're going to want, your command sequence should look more like:

git switch main
git switch -c branchName2

(or git checkout main and then git checkout -b branchName2). The main here might be master or develop or some other name, depending on what you and your team have decided to use, but the general idea is that you'll first switch back to the commit that you want to use as the parent of the new commit you plan to make in a moment. To do that, you'll use any branch name that finds that commit.

Note that you can "plan in advance", if you prefer. Suppose you have not yet made branchName at all, and just have:

...--G--H   <-- main (HEAD)

You can now run:

git branch branchName
git branch branchName2

and get:

...--G--H   <-- main (HEAD), branchName, branchName2

That is, at this point all three names exist, and all of them select commit H, your starting point commit. You can now git switch to any name and start making new commits.

(I usually create the branch name while switching to it myself, though: the old git checkout -b or the newfangled git switch -c.)

¹The git checkout command, which is much older, does both switch and restore. The git switch command is "safe" in that it won't wreck uncommitted work; the git restore command is "unsafe" in that it assumes that if you give it arguments that mean wreck my uncommitted work, that you mean that. Newbies to Git don't know which arguments to git checkout run it in "safe" (switch) mode and which ones run it in "unsafe" (restore) mode. So if you use the two separate commands, you just need to take extra care when running git restore, not on every branch-switch.

I just asked a simple question and this man made me professional at using Git. thanks a lot! — krackgen, Sep 27 '22 at 19:46