How to stop senseless merge conflicts in Git when branching off of an unmerged branch?

Question

When branching off of master, I now have a branch B. I do my work on branch B and then create a PR targeting master. I now create a new branch that depends on my changes in branch B, so I branch off of branch B and now have branch C that also includes everything in branch B.

The problem arises when I merge branch B into master, then when I create a PR for branch C targeting master there always seems to be merge conflicts. Why is this and is there some way around it?

Note that all commits are in sequential order, meaning all commits on branch B were committed before all commits in branch C.

Merge conflicts can happen for so many reasons, one of which is that there exist even one source file which you modified in branch C, which modified a single line which was also modified in the master branch. — Tim Biegeleisen, Oct 05 '20 at 17:24
I would suggest that branching off a branch is bad practice and increases the likelihood of merge conflicts. Don't do it. — matt, Oct 05 '20 at 17:25
Branch C off `master`, since that's where you are intending to PR-merge it later. — matt, Oct 05 '20 at 18:16
@matt and pull branch B into branch C? How does this differ from branching directly off of branch B? — Derek M, Oct 05 '20 at 18:26
Note: I added a [tag:github] tag as Pull Requests are a feature of various servers like GitHub and Bitbucket. If I picked the wrong one, you should probably fix it; but while each server has its own peculiarities, they all share some common behavior with their pull request mechanisms. — torek, Oct 06 '20 at 04:44
@torek, I disagree.... pull requests are not a feature of services like GitHub. A pull request is simply a request from one developer to another to pull and possibly merge his branch. It can be done by email, by phone or yelling across the room. Services like GitHub may make the process easier but they did not invent pull requests and are not necessary to do them. The inventor of git hates GitHub. He certainly does not hate pull requests. — JoelFan, Oct 06 '20 at 04:47
@JoelFan: that's true, *but*, when you're using GitHub and its PR features, you end up with having to deal with a lot of fallout. So it's important to know which service and what features people are using with it. — torek, Oct 06 '20 at 04:48

score 0 · Answer 1 · answered Oct 05 '20 at 18:30

Funnily enough, this is a question I had today too. I thought of checking out branch C and then merging master into it, so any 'low level' changes which might affect both branches, can be easily added to both branches if they need to be.

I don't want to suggest that is good practice, and incidentally, in the end, I realised that (in my scenario), branching off my B branch was going to get pretty messy, and so I switched to a model where both branches were branching off master.

score 0 · Accepted Answer · answered Oct 06 '20 at 04:47

Branches don't branch off branches. Well, unless they do. The problem here is the word branch, which is too wobbly (ill-defined) to let us even think about how this all works. See What exactly do we mean by "branch"?

What's going on here is this:

Git is all about commits.
Each commit has a unique number. This number looks random, but isn't. To guarantee its uniqueness, it's a really big number, expressed in hexadecimal, such as e1cfff676549cdcd702cbac105468723ef2722f4.
Each commit records both a snapshot of every file¹ and some metadata, such as the name and email address of whoever made the commit. In the metadata, each commit stores the number (or numbers) of the immediate previous commit, which Git calls the parent (or parents) of that commit.

This means that commits, by themselves, form backwards-looking chains. If we use uppercase letters to stand in for commit hash IDs, we can draw a simple chain like this:

... <-F <-G <-H

Here H is the hash ID of the last commit in the chain. Inside commit H, we have a full snapshot of all files, plus the metadata saying who made the commit, when, and why, plus the raw hash ID of earlier commit G.

Git can look up any commit (or any internal object, for that matter) by its hash ID, so this means that as long as we know H's hash ID, we can get H. Using H, Git can find G's hash ID, so Git can get G. Using G, Git can find F's hash ID, and so on.

All we need, then, is the hash ID of the last commit in the chain ... and that's just what a branch name does for us, and for Git: it holds the hash ID of the last commit.

By definition, the hash ID in some name is the last commit in the chain, even if the chain keeps going on:

...--F--G   <-- branch1
         \
          H   <-- branch2
           \
            I--J   <-- branch3

Commits up through G are on all three branches. Commit H is on two branches, and commits I-J are only on one branch, namely branch3. H is the last commit on branch2.

It looks like we made branch3 by branching off branch2 (and then making two commits). But we can delete any branch name at any time.² If we delete the name branch2 now, we get:

...--F--G   <-- branch1
         \
          H--I--J   <-- branch3

and now it looks like we made branch3 by branching off branch1.

None of the commits changed, in this process. No commit can ever change, because those random-looking hash IDs are actually cryptographic checksums of the contents of each commit.³ But we can add and delete names any time we like. The only constraint here is that each branch name must identify one specific commit, by its hash ID; that one commit, which must actually exist, is automatically the last commit in that branch.

¹More precisely, each commit has a tree object that records the files that were, at the time you (or whoever) made the commit, in that Git's index or staging area, in the form they had there at that time. These files are frozen forever, but are compressed and de-duplicated, so that whenever multiple commits share some particular file, there's really only one copy of that file in the repository, shared across all those commits.

²Note that if you delete the only name by which you and Git can find some commit, you may be in trouble later if you ever want that commit. So in general we don't delete a name until we're sure that its commits are findable some other way, or unwanted.

³This is true for all of Git's internal objects. Git also checks that the hash ID key, which it used to retrieve the object from its key-value database holding all the Git objects, matches the checksum of the retrieved data. This provides a consistency check on the data: if something has gone wrong with the computer, and the data are corrupt, Git will notice.

This cryptographic checksum is also how every Git manages to agree that any particular commit gets its unique hash ID, and it means two Gits can exchange objects just by comparing hash IDs. Because hash IDs lead to more hash IDs, this allows everything to be known (though not checked immediately) just by knowing the last thing. See Merkle Trees.

Consequences

All of the above has some really important consequences:

Adding a new commit to the current branch just makes the branch name move. That is, we start with:
```
 ...--G--H   <-- branch (HEAD)
```
The special name HEAD gets attached to one particular branch; that's the name we're using. That's how Git knows both the name, in this case branch, and the commit: HEAD gets the name and the name gets the commit. Then when we make our new commit I, Git updates the name to which HEAD is attached, and makes the new commit point back to the commit that was the tip just a moment ago:
```
 ...--G--H--I   <-- branch (HEAD)
```
We can rebase a branch by copying all of its commits to new and improved commits.

That is, we make use of the fact that we find commits by starting from a name that identifies the last commit and working backwards. Suppose we have:
```
 ...--F--G--H   <-- main or master or whatever
       \
        I--J--K   <-- feature (HEAD)
```
There is nothing wrong with the three commits that are only on feature but we want them to extend the mainline (master or whatever) branch. So we run:
```
 git rebase master
```
This works by copying existing commits I-J-K to new-and-improved commits. The new commits have totally different hash IDs, and probably different snapshots, but they do the same things that I-J-K did, and we now want to use them in place of I-J-K. Let's draw the new commits:
```
              I'-J'-K'  <-- some-temporary-name
             /
 ...--F--G--H   <-- master
       \
        I--J--K   <-- feature
```
If we could just get Git to rip the name feature off commit K, and make it point to K' instead, then—because nobody ever looks at the raw hash IDs—everyone will suddenly think that the commits somehow changed. They didn't: the originals are still in there. And in fact, not everyone sees the new commits. In fact, only our own Git repository has the name moved. If someone else—some other Git repository—has a name that remembers the old hash IDs, they'll keep remembering the old commits.

So that's why it's tricky to rebase commits that your Git has given to some other Git. All Gits work by commit hash IDs, and only use the names to find the last one. Each Git has its own set of names, so you now have to get all the other Git repositories to change their names around too.
Merging works by commit hash IDs. We might have this:
```
           I--J   <-- branch1 (HEAD)
          /
 ...--G--H
          \
           K--L   <-- branch2
```
to start out with, and run git merge branch2 to make a new merge commit, but in fact, we're really starting out with commit J as our current commit—the tip of branch1—and telling Git to merge commit L. The eventual merge commit looks like this:
```
           I--J
          /    \
 ...--G--H      M   <-- branch1 (HEAD)
          \    /
           K--L   <-- branch2
```
Note how the name branch1 moved, but all we really did was add a new commit, M, that has two parents instead of just the usual one. Commit J, which is the one we were using a moment ago, is the first parent of new merge M; commit L, which is the one we merged, is the second.

Actually using all of this in practice

Let's suppose we have a simple repository with just one master branch name:

...--G--H   <-- master (HEAD)

(It's the only branch name, so HEAD must be attached to it. We don't normally have to care because this repository is someone else's, probably on a server like GitHub, where it's a so-called bare repository and its HEAD is pretty much irrelevant.)

We make a clone of this simple repository, and in the clone, we also make a master name, also pointing to H. This clone has a remote-tracking name copied from the original (GitHub) repository but modified, to read origin/master, so in our clone we have:

...--G--H   <-- master (HEAD), origin/master

Now we make a new branch name:

...--G--H   <-- master (HEAD), origin/master, br

and attach HEAD to it:

...--G--H   <-- master, origin/master, br (HEAD)

We haven't changed commits—we're still using commit H—but now new commits will update the name br, rather than our master. Now we make a few new commits:

...--G--H   <-- master, origin/master
         \
          I--J   <-- br (HEAD)

Our feature works, so we use git push to send the new commits to GitHub and raise a Pull Request, which asks someone else—someone who controls the GitHub repository—to combine our commits with their work.

Note that they now have:

...--G--H   <-- master (HEAD)
         \
          I--J   <-- (some pull request number)

If we're all sharing the main GitHub repository, their GitHub repository will also probably have a br branch name, but if we're using GitHub's fork system, they won't have a br branch name at all: there will be two GitHub Git repositories, one being your fork and one being their main repository, and your fork will have a br branch, but their main repository won't. This can get fairly confusing as we now have three or more repositories involved, each of which has its own branch names!

There are a bunch of problems that come up now, because they—whoever "they" are—are the ones in control at this point. All you have done, and all you can do, is send your commits to your own GitHub repository—which might be shared or might be one of these more complex fork things—and ask them to look at your Pull Request. The Pull Request is a GitHub thing, not a Git thing: the Git thing is the commits, which form into chains ended by some name. Is a pull request, which is the end of some chain, a branch? It's not a branch name, but it works like a branch. Should we call it "a branch"? That goes back to What exactly do we mean by "branch"?

Anyway, having made the PR, whoever is in control of the PR can now do any of these things:

Reject your PR entirely. This isn't really all that interesting here since we're looking at merges rather than rewrites, but it's something to consider.
Use the web interface to click a button that says merge.
Use the web interface to click a button that says rebase and merge.
Use the web interface to click a button that says squash and merge.

These last three options all do different things. These ripple back into what you can do next.

If they merge

If they use the merge button, things are easiest for you, because this literally keeps your actual commits—with their hash IDs—and just incorporates those into their repository, using a merge commit. They just add a new merge commit, to get:

...--G--H------M   <-- master (HEAD)
         \    /
          I--J   <-- (some pull request number)

in their repository. You can now have your Git fetch this new commit M from them, into your local repository. If you're using a GitHub fork, after getting M locally, on your laptop, you can send M back to your own GitHub fork. You now have:

          __--  <-- master
         .
...--G--H------M   <-- origin/master
         \    /
          I--J   <-- br (HEAD)

in your repository. Note that your master has not moved and still points to commit H; to solve this annoyance, you can just delete your name master entirely, if you like. You can go move your own master to point to M, like theirs, but that's kind of a pain; if it doesn't bother you, you can just use your origin/master name to keep track of their master. (I tend to move my master around myself but I sometimes wonder why I bother.)

If they rebase-and-merge

When they use the rebase and merge button instead of the merge button, what they get, in their repository, is a set of copies of your commits. Their master then moves forward to point to the last copied commit, like this:

...--G--H--I'-J'  <-- master (HEAD)
         \
          I--J   <-- (some pull request number)

When you grab their new commits, your Git now has:

          __--  <-- master
         .
...--G--H--I'-J'  <-- origin/master
         \
          I--J   <-- br (HEAD)

As before, your own master is just in the way here, cluttering things up and making it harder to draw the graph. Your I-J are now redundant and perhaps even in the way. The fact that they made copies of your commits can become a headache for you. Your Git doesn't know that their copies are the new-and-improved ones, and has no idea that you should make your name br refer to commit J' instead of commit J.⁴

⁴If they broke something, maybe it shouldn't. Maybe you should keep your J and figure out how to fix theirs. But that's not something your Git can figure out on its own.

If they squash-and-merge

If they use GitHub's squash and merge button, what they get, in their repository, is a single commit that holds a snapshot that matches the snapshot of your final commit. We can draw that like this:

...--G--H--IJ   <-- master (HEAD)
         \
          I--J   <-- (some pull request number)

Note that their commit IJ has a completely different hash ID, unique to it, just like the rebase case. But unlike the rebase case there's no one-to-one mapping from their squashed commit back to each of your individual commits.⁵ Still, the way you must deal with it is the same as the way you must deal with the rebase-and-merge case.

⁵If your "chain" in your PR consisted of a single commit, the "squashed" chain also consists of a single commit, and therefore there is a one-to-one correspondence. So here, rebase-and-merge and squash-and-merge collapse into a single case.

The bottom line

Ultimately, when you boil it all down a little too far, what you end up with is "it's complicated". You have to keep going a bit further into the details before you can decide how to deal with this.

I also mentioned in a comment that each hosting service (GitHub, Bitbucket, GitLab, etc) have their own peculiarities. The text above describes GitHub's three options. Others will have other options. Each makes use of Git's basic abilities, but in different ways. You really do have to learn how the Git commit graph works, and how merges work, and get into all these kinds of nitty details. It can't be simplified further without losing something.

score 0 · Answer 3 · answered Oct 06 '20 at 04:50

What you should do is start with master. Make a branch A from master. Merge A to master. Make a branch B from master (this will include the content of A since it was already merged to master). Merge B to master. No repeated merge conflicts.

Only create new branches from master, never branch from another branch.