Branches don't branch off branches. Well, unless they do. The problem here is the word branch, which is too wobbly (ill-defined) to let us even think about how this all works. See What exactly do we mean by "branch"?
What's going on here is this:
Git is all about commits.
Each commit has a unique number. This number looks random, but isn't. To guarantee its uniqueness, it's a really big number, expressed in hexadecimal, such as e1cfff676549cdcd702cbac105468723ef2722f4
.
Each commit records both a snapshot of every file1 and some metadata, such as the name and email address of whoever made the commit. In the metadata, each commit stores the number (or numbers) of the immediate previous commit, which Git calls the parent (or parents) of that commit.
This means that commits, by themselves, form backwards-looking chains. If we use uppercase letters to stand in for commit hash IDs, we can draw a simple chain like this:
... <-F <-G <-H
Here H
is the hash ID of the last commit in the chain. Inside commit H
, we have a full snapshot of all files, plus the metadata saying who made the commit, when, and why, plus the raw hash ID of earlier commit G
.
Git can look up any commit (or any internal object, for that matter) by its hash ID, so this means that as long as we know H
's hash ID, we can get H
. Using H
, Git can find G
's hash ID, so Git can get G
. Using G
, Git can find F
's hash ID, and so on.
All we need, then, is the hash ID of the last commit in the chain ... and that's just what a branch name does for us, and for Git: it holds the hash ID of the last commit.
By definition, the hash ID in some name is the last commit in the chain, even if the chain keeps going on:
...--F--G <-- branch1
\
H <-- branch2
\
I--J <-- branch3
Commits up through G
are on all three branches. Commit H
is on two branches, and commits I-J
are only on one branch, namely branch3
. H
is the last commit on branch2
.
It looks like we made branch3
by branching off branch2
(and then making two commits). But we can delete any branch name at any time.2 If we delete the name branch2
now, we get:
...--F--G <-- branch1
\
H--I--J <-- branch3
and now it looks like we made branch3
by branching off branch1
.
None of the commits changed, in this process. No commit can ever change, because those random-looking hash IDs are actually cryptographic checksums of the contents of each commit.3 But we can add and delete names any time we like. The only constraint here is that each branch name must identify one specific commit, by its hash ID; that one commit, which must actually exist, is automatically the last commit in that branch.
1More precisely, each commit has a tree object that records the files that were, at the time you (or whoever) made the commit, in that Git's index or staging area, in the form they had there at that time. These files are frozen forever, but are compressed and de-duplicated, so that whenever multiple commits share some particular file, there's really only one copy of that file in the repository, shared across all those commits.
2Note that if you delete the only name by which you and Git can find some commit, you may be in trouble later if you ever want that commit. So in general we don't delete a name until we're sure that its commits are findable some other way, or unwanted.
3This is true for all of Git's internal objects. Git also checks that the hash ID key, which it used to retrieve the object from its key-value database holding all the Git objects, matches the checksum of the retrieved data. This provides a consistency check on the data: if something has gone wrong with the computer, and the data are corrupt, Git will notice.
This cryptographic checksum is also how every Git manages to agree that any particular commit gets its unique hash ID, and it means two Gits can exchange objects just by comparing hash IDs. Because hash IDs lead to more hash IDs, this allows everything to be known (though not checked immediately) just by knowing the last thing. See Merkle Trees.
Consequences
All of the above has some really important consequences:
Adding a new commit to the current branch just makes the branch name move. That is, we start with:
...--G--H <-- branch (HEAD)
The special name HEAD
gets attached to one particular branch; that's the name we're using. That's how Git knows both the name, in this case branch
, and the commit: HEAD
gets the name and the name gets the commit. Then when we make our new commit I
, Git updates the name to which HEAD
is attached, and makes the new commit point back to the commit that was the tip just a moment ago:
...--G--H--I <-- branch (HEAD)
We can rebase a branch by copying all of its commits to new and improved commits.
That is, we make use of the fact that we find commits by starting from a name that identifies the last commit and working backwards. Suppose we have:
...--F--G--H <-- main or master or whatever
\
I--J--K <-- feature (HEAD)
There is nothing wrong with the three commits that are only on feature
but we want them to extend the mainline (master
or whatever) branch. So we run:
git rebase master
This works by copying existing commits I-J-K
to new-and-improved commits. The new commits have totally different hash IDs, and probably different snapshots, but they do the same things that I-J-K
did, and we now want to use them in place of I-J-K
. Let's draw the new commits:
I'-J'-K' <-- some-temporary-name
/
...--F--G--H <-- master
\
I--J--K <-- feature
If we could just get Git to rip the name feature
off commit K
, and make it point to K'
instead, then—because nobody ever looks at the raw hash IDs—everyone will suddenly think that the commits somehow changed. They didn't: the originals are still in there. And in fact, not everyone sees the new commits. In fact, only our own Git repository has the name moved. If someone else—some other Git repository—has a name that remembers the old hash IDs, they'll keep remembering the old commits.
So that's why it's tricky to rebase commits that your Git has given to some other Git. All Gits work by commit hash IDs, and only use the names to find the last one. Each Git has its own set of names, so you now have to get all the other Git repositories to change their names around too.
Merging works by commit hash IDs. We might have this:
I--J <-- branch1 (HEAD)
/
...--G--H
\
K--L <-- branch2
to start out with, and run git merge branch2
to make a new merge commit, but in fact, we're really starting out with commit J
as our current commit—the tip of branch1
—and telling Git to merge commit L
. The eventual merge commit looks like this:
I--J
/ \
...--G--H M <-- branch1 (HEAD)
\ /
K--L <-- branch2
Note how the name branch1
moved, but all we really did was add a new commit, M
, that has two parents instead of just the usual one. Commit J
, which is the one we were using a moment ago, is the first parent of new merge M
; commit L
, which is the one we merged, is the second.
Actually using all of this in practice
Let's suppose we have a simple repository with just one master
branch name:
...--G--H <-- master (HEAD)
(It's the only branch name, so HEAD
must be attached to it. We don't normally have to care because this repository is someone else's, probably on a server like GitHub, where it's a so-called bare repository and its HEAD
is pretty much irrelevant.)
We make a clone of this simple repository, and in the clone, we also make a master
name, also pointing to H
. This clone has a remote-tracking name copied from the original (GitHub) repository but modified, to read origin/master
, so in our clone we have:
...--G--H <-- master (HEAD), origin/master
Now we make a new branch name:
...--G--H <-- master (HEAD), origin/master, br
and attach HEAD
to it:
...--G--H <-- master, origin/master, br (HEAD)
We haven't changed commits—we're still using commit H
—but now new commits will update the name br
, rather than our master
. Now we make a few new commits:
...--G--H <-- master, origin/master
\
I--J <-- br (HEAD)
Our feature works, so we use git push
to send the new commits to GitHub and raise a Pull Request, which asks someone else—someone who controls the GitHub repository—to combine our commits with their work.
Note that they now have:
...--G--H <-- master (HEAD)
\
I--J <-- (some pull request number)
If we're all sharing the main GitHub repository, their GitHub repository will also probably have a br
branch name, but if we're using GitHub's fork system, they won't have a br
branch name at all: there will be two GitHub Git repositories, one being your fork and one being their main repository, and your fork will have a br
branch, but their main repository won't. This can get fairly confusing as we now have three or more repositories involved, each of which has its own branch names!
There are a bunch of problems that come up now, because they—whoever "they" are—are the ones in control at this point. All you have done, and all you can do, is send your commits to your own GitHub repository—which might be shared or might be one of these more complex fork things—and ask them to look at your Pull Request. The Pull Request is a GitHub thing, not a Git thing: the Git thing is the commits, which form into chains ended by some name. Is a pull request, which is the end of some chain, a branch? It's not a branch name, but it works like a branch. Should we call it "a branch"? That goes back to What exactly do we mean by "branch"?
Anyway, having made the PR, whoever is in control of the PR can now do any of these things:
Reject your PR entirely. This isn't really all that interesting here since we're looking at merges rather than rewrites, but it's something to consider.
Use the web interface to click a button that says merge.
Use the web interface to click a button that says rebase and merge.
Use the web interface to click a button that says squash and merge.
These last three options all do different things. These ripple back into what you can do next.
If they merge
If they use the merge button, things are easiest for you, because this literally keeps your actual commits—with their hash IDs—and just incorporates those into their repository, using a merge commit. They just add a new merge commit, to get:
...--G--H------M <-- master (HEAD)
\ /
I--J <-- (some pull request number)
in their repository. You can now have your Git fetch this new commit M
from them, into your local repository. If you're using a GitHub fork, after getting M
locally, on your laptop, you can send M
back to your own GitHub fork. You now have:
__-- <-- master
.
...--G--H------M <-- origin/master
\ /
I--J <-- br (HEAD)
in your repository. Note that your master
has not moved and still points to commit H
; to solve this annoyance, you can just delete your name master
entirely, if you like. You can go move your own master
to point to M
, like theirs, but that's kind of a pain; if it doesn't bother you, you can just use your origin/master
name to keep track of their master
. (I tend to move my master
around myself but I sometimes wonder why I bother.)
If they rebase-and-merge
When they use the rebase and merge button instead of the merge button, what they get, in their repository, is a set of copies of your commits. Their master
then moves forward to point to the last copied commit, like this:
...--G--H--I'-J' <-- master (HEAD)
\
I--J <-- (some pull request number)
When you grab their new commits, your Git now has:
__-- <-- master
.
...--G--H--I'-J' <-- origin/master
\
I--J <-- br (HEAD)
As before, your own master
is just in the way here, cluttering things up and making it harder to draw the graph. Your I-J
are now redundant and perhaps even in the way. The fact that they made copies of your commits can become a headache for you. Your Git doesn't know that their copies are the new-and-improved ones, and has no idea that you should make your name br
refer to commit J'
instead of commit J
.4
4If they broke something, maybe it shouldn't. Maybe you should keep your J
and figure out how to fix theirs. But that's not something your Git can figure out on its own.
If they squash-and-merge
If they use GitHub's squash and merge button, what they get, in their repository, is a single commit that holds a snapshot that matches the snapshot of your final commit. We can draw that like this:
...--G--H--IJ <-- master (HEAD)
\
I--J <-- (some pull request number)
Note that their commit IJ
has a completely different hash ID, unique to it, just like the rebase case. But unlike the rebase case there's no one-to-one mapping from their squashed commit back to each of your individual commits.5 Still, the way you must deal with it is the same as the way you must deal with the rebase-and-merge case.
5If your "chain" in your PR consisted of a single commit, the "squashed" chain also consists of a single commit, and therefore there is a one-to-one correspondence. So here, rebase-and-merge and squash-and-merge collapse into a single case.
The bottom line
Ultimately, when you boil it all down a little too far, what you end up with is "it's complicated". You have to keep going a bit further into the details before you can decide how to deal with this.
I also mentioned in a comment that each hosting service (GitHub, Bitbucket, GitLab, etc) have their own peculiarities. The text above describes GitHub's three options. Others will have other options. Each makes use of Git's basic abilities, but in different ways. You really do have to learn how the Git commit graph works, and how merges work, and get into all these kinds of nitty details. It can't be simplified further without losing something.