0

I need to do some special merging that I've never done before so I need guidance.

I have a master branch from where I diverged (created) a new branch called "v1.5.4" The reason for this is because we want developers to continue on master for our final version "1.6.0".

Others continue on "v1.5.4".

Now my boss wants me to create a new branch for "v1.5.5", which I created from "v1.5.4". Now all three are active branches where commits are being made.

What is the simplest way to merge these two branches back into master? Not all of them are spawned from master (like "v1.5.5"). So how do I do this?

My guess would be to do the following while I am on the master branch:

  1. git merge v1.5.4
  2. git merge v1.5.5

But I am not sure about the second one, since v1.5.5 was not created from master. Is there a better way? The goal is to make sure that all commits from v1.5.4 and v1.5.5 are put into master.

One issue I am worried about is this: When commits are made to v1.5.4, these eventually need to be merged into v1.5.5. So is there an order to follow when my boss asks me to delivery 1.5.5?

Ray
  • 4,679
  • 10
  • 46
  • 92

1 Answers1

1

You may have some issues here. Here's what to know:

  • Branches don't record how they were created. In fact, a branch name records just one commit hash ID.

  • Git works backwards.

  • Merge uses this backwards stuff to find a merge base.

Here's what you need to know before you can understand the above:

  • Each commit is numbered. The numbers, however, are not simple counting numbers: we don't find commit #1 followed by commit #2, then #3, and so on. Instead, each commit gets a big, ugly, random-looking (but not actually random at all), and unique hash ID. This gets expressed as a hexadecimal number: a hash ID.

  • Nobody can remember these numbers. Only a computer can deal with them. So we don't bother remembering the numbers; we have the computer do it.

  • Each commit remembers the commit-number of its previous commit. This forms the commits into backwards-looking chains. Given a commit number, Git can easily extract the commit—both its snapshot of all files, and its metadata. The metadata includes stuff like who made the commit and when, but also that all-important parent commit hash ID.

  • This means we need a quick way to find the last commit in any chain. From there, Git can work backwards.

That—holding the last commit hash ID number—is the purpose of a branch name. We can draw a commit chain, using uppercase letters to stand in for the real (random-looking, big and ugly) hash IDs, like this:

... <-F <-G <-H   <--branch

The branch name branch holds just the hash ID of the last commit, which we'll call H for Hash, here. Commit H itself holds the hash ID of the previous commit G. So using the name branch, Git can find (look at, check out, whatever) commit H. Then using what's in commit H, Git can find commit G. That lets Git find commit F, and so on, backwards, down the line.

When we run git checkout or git switch and provide Git with a branch name, we're telling Git two things:

  1. Extract the given commit's snapshot, using the hash ID as found in the branch name, so that I can see / work on / work with it.
  2. Make that branch name the current branch name and that commit the current commit.

We do our work as usual and eventually run git commit, which tells Git:

  • Package up a new snapshot (from Git's index aka the staging area, not from our working tree, but never mind that here).
  • Add appropriate metadata: me as the committer, now as the date-and-time, and so on.
  • Use the current commit (hash ID) as the parent of the new commit.
  • Actually store all of that—snapshot and metadata—as the new commit, producing a new hash ID.
  • Write the new commit's hash ID into the current branch name.

This last step is how a branch grows, one commit at a time, as someone makes commits:

...--G--H   <-- somebranch (HEAD)

becomes:

...--G--H--I   <-- somebranch (HEAD)

with the branch name now pointing to commit I instead of commit H. The (HEAD) drawn in here means that the special name HEAD is attached to (or points to, if you prefer) the name somebranch, which is how Git knows that this is the current branch name. The name itself holds the hash ID of the current commit, so that updating the name automatically updates the current commit ID.

With all that in mind, let's look at merging

When Git goes to do a true merge—we'll come back to this concept in a later section—we have a picture that looks like this:

          I--J   <-- branch1 (HEAD)
         /
...--G--H
         \
          K--L   <-- branch2

That is, the current commit on the current branch branch1 is commit J; meanwhile, the name branch2 selects commit L.

Commit J has as its parent I, which has as its parent H. So following the string backwards, we go J, I, H, G, and so on.

Meanwhile, commit L has as its parent K, which has as its parent H. Following this string backwards, we go L, K, H, G, and so on.

Commits up through and including H are therefore on both branches. Some commits may be only on one branch; many commits are generally on many branches, and it's pretty common to have one commit—the very first one ever made—on all branches. This sharing is how Git is able to do git merge at all.

This drawing, by the way, is just one possible representation of the commit graph. If you use git log --all --decorate --oneline --graph (Git Log with A DOG), Git will try to draw a graph, but Git likes to draw them vertically, rather than horizontally like I did here. See the question for lots of other ways to view commit graphs.

To perform a merge, Git will locate the best shared ancestor of the two tip commits. The two tip commits are of course the commit you're on right now—commit J—and the one you specify with your git merge command. So, whether you run git merge branch2 or git merge hash-of-L, Git will find commit L, and then do the chase-things-backwards trick to find commit H.

Commit H is "better than" commit G, and in this drawing it's obvious why, but in real graphs, it's often not obvious. Git uses an algorithm (Lowest Common Ancestor of a Directed Acyclic Graph) to find the merge base. There can, in some cases, be more than one; this case is complicated, and I leave it to another section below.

The goal of merging is to combine work. The way Git does this is:

  • Compare the merge base snapshot to the --ours commit (commit J in this example). Whatever is different in these two snapshots, that's what we did.
  • Independently, compare the merge base snapshot to the --theirs commit (commit L in this example). Whatever is different in these two snapshots, that's what they did.
  • Combine our work—our changes—with their changes. Whenever they don't interfere with each other, take both changes to any given file. If they do interfere with each other, declare a merge conflict and get a human to clean up the mess (I'll omit the details here).

This combining is (normally—this is all Git can do anyway) done purely on a textual basis, line-by-line. That works surprisingly well, but clearly cannot be perfect. You should inspect and test the result, because Git does not understand the changes it is combining like this. In any case, Git then applies the combined changes to the merge base version of the file. That way, we keep our changes and add theirs. If Git itself does not detect any conflicts, Git will normally go on to make a merge commit.

A merge commit is just like any ordinary commit—it has a snapshot, like any ordinary commit, and metadata, like any ordinary commit—with one exception: its metadata records two (or more, but let's ignore this case entirely) parent commit hash IDs, instead of the usual one. The first parent—important in things like git log --first-parent—is the usual first parent. The second parent is the hash ID of the commit we said to merge: in this case, commit L. So the result of this merge is:

          I--J
         /    \
...--G--H      M   <-- branch1 (HEAD)
         \    /
          K--L   <-- branch2

Note how new commit M, our merge, is only on branch1, and is now the tip commit of branch1, from which we can work backwards. But now, when Git does its working-backwards, commit M connects back to both commit J and commit L. So commits K and L are now also on branch1. The name branch2 is not changed: it still points to commit L.

Suppose work continues on branch2:

          I--J
         /    \
...--G--H      M   <-- branch1
         \    /
          K--L---N--O   <-- branch2 (HEAD)

We can now git checkout branch1 or git switch branch1:

          I--J
         /    \
...--G--H      M   <-- branch1 (HEAD)
         \    /
          K--L---N--O   <-- branch2

and run git merge branch2 again. Git will start at M and work backwards, listing out commit hash IDs for J and L. Git will also start at branch2 / O and work backwards, listing out commit hash IDs O and N and L—and, aha, commit L is on both branches! (So are K and H, but since we can only go backwards, we can't go from L to M: commit M is not on branch2 at this point. And obviously L is better than K or H.)

So now, Git will diff L-vs-M to see what we changed—that's the stuff we kept from I-J—and then diff L-vs-O to see what they changed, which are the new commits they added. Git will then combine these two sets of changes, apply them to commit L, and make a new merge commit:

          I--J
         /    \
...--G--H      M------P   <-- branch1 (HEAD)
         \    /      /
          K--L---N--O   <-- branch2

If work continues on branch2 at this point, the next merge into branch1 will use commit O as the merge base.

What you have

I have a master branch from where I diverged (created) a new branch called "v1.5.4" The reason for this is because we want developers to continue on master for our final version "1.6.0".

The names aren't all that important. What matters are the commits. Let's draw some commits—but you should draw the ones you really do have (or a simplified version of those), rather than using the ones I am drawing now:

...--G--H   <-- master
         \
          I--J   <-- v1.5.4

Or, if another new commit got made on master:

...--G--H--K   <-- master
         \
          I--J   <-- v1.5.4

Note: you'll run out of letters fast; consider using just round os or bullets for any commit you don't need to refer to by name/number/whatever:

...--o--o--o   <-- master
         \
          o--o   <-- v1.5.4

Now my boss wants me to create a new branch for "v1.5.5", which I created from "v1.5.4". Now all three are active branches where commits are being made.

What matters is which commits are "on" (reachable by working backwards from) the name. If you made the name while v1.5.4 was pointing to commit J, and then made new commits on the new name, you might have:

...--G--H--K   <-- master
         \
          I--J--N   <-- v1.5.4
              \
               L--M   <-- v1.5.5

now.

If you git checkout master and git merge v1.5.4, what Git sees is "merge commit N". Commits L and M will not participate in this, but commit I—which is on both branches—will. Let's make one more commit on v1.5.4 afterwards so that we have this:

...--G--H--K------O   <-- master
         \       /
          I--J--N--P   <-- v1.5.4
              \
               L--M   <-- v1.5.5

If you now git merge v1.5.5, Git will find the merge base by:

  • walking backwards from O (through N and J and I, and also K);
  • walking backwards from M (through L and J and—whoops, we're done!)

and so as you can see J will be the merge base. Git will diff J vs O to see what we have on master, and J vs M to see what they have on v1.5.5. Your job is just going to be to resolve any conflicts that come up. At this point, there may not be any: it depends on whether changes in N interfere with changes in L and/or M.

Over time, however, the graphs will get very tangle-y. It will be hard to see, visually, where a merge base is. You can run git merge-base --all and give it the two branch names, or commit hash IDs, to have Git find the merge base(s). If you get just one merge base, the merge is fundamentally simpler.

Try to avoid criss-cross merges

You can get two merge bases when there are two "best shared commits". This most often occurs by doing criss-cross merges. Suppose that you have this:

          I--J   <-- branch1
         /
...--G--H
         \
          K--L   <-- branch2

and you check out one branch and run git merge to get:

          I--J---M   <-- branch1 (HEAD)
         /      /
...--G--H      /
         \    /
          K--L   <-- branch2

Meanwhile, someone else checked out branch2 and ran git merge branch1 while branch1 selects commit J to get:

          I--J   <-- branch1
         /    \
...--G--H      \
         \      \
          K--L---N   <-- branch2 (HEAD)

(They get a different commit hash ID, because every commit gets a unique hash ID.) Now you and they combine repositories, which is just a matter of adding commits M and N to each other's repositories and moving branch names forward:

          I--J---M   <-- branch1
         /    \ /
...--G--H      X
         \    / \
          K--L---N   <-- branch2

This is the criss-cross merge. If we later add some more commits to either or both branches:

          I--J---M--O--P   <-- branch1 (HEAD)
         /    \ /
...--G--H      X
         \    / \
          K--L---N   <-- branch2

and run git merge on the other branch, Git has to work backwards as usual:

  • P, then O, then M, then J-and-L, then I-and-K`, and so on...
  • N, then L-and-J, then K-and-I, and so on...

and this time both J and L are equally good. So both commits are merge bases for this merge.

The way Git handles this by default (-s recursive) is to merge the two merge bases, then use the result as the merge base for your merge. Using -s resolve, you can direct Git to pick one of the two at apparent-random: you don't get to control which one Git uses. When and why the results can differ is beyond the scope of this answer, but they can. If they do, usually -s recursive produces a better result, but sometimes its result is very confusing (e.g., when the merge bases themselves have merge conflicts).

Note that git merge-base --all branch1 branch2 will print both merge base commit hash IDs, but git merge-base branch1 branch2 will do the resolve thing of picking one at apparent-random and just print that one. So always remember --all when inspecting a merge to see what the heck made Git explode.

Fast-forward "merge"

In all of the above cases, Git had to make a "real merge". That is, Git found a merge base, and from the merge base, both branches had new commits. So Git had to run two git diff commands (or the internal equivalents), and then combine changes if both sides touched the same files.

Sometimes, though, we have a simple graph like this:

...--G--H   <-- main (HEAD)
         \
          I--J   <-- feature

When we run git merge feature, Git finds the merge base as usual, by working backwards. Working backwards from H starts at H, of course, and working backwards from J, we go J, I, H—oh hey here we are! The merge base is commit H!

But if we ran a diff now, we'd ask to diff H against H, to see what we changed. That's kind of dumb, isn't it? If we compare the snapshot in H to the snapshot in H, well, nothing changed. So ... why bother?

And, if we compare H vs J, well, we'll get whatever changed. If we put those changes into the snapshot from H, we'll get the snapshot in J. So, again: why even bother? What if we just use commit J itself?

That's what Git will do in this case, if you let it. Instead of making a new but kind of useless merge commit:

...--G--H------K   <-- main (HEAD)
         \    /
          I--J   <-- feature

Git will, in this case, do a fast-forward. In effect, it just does a git checkout or git switch to commit J, but drags the branch name main forward too:

...--G--H
         \
          I--J   <-- feature, main (HEAD)

after which we can just draw a straight-line of commits:

...--G--H--I--J   <-- feature, main (HEAD)

If this is what you want, use git merge, or even git merge --ff-only, to do the merge. Git will use the fast-forward not-really-a-merge method if possible, and if you said --ff-only, will error out if it's not possible.

If this isn't what you want—if you want a distinct commit, e.g., for tagging purposes—use git merge --no-ff to force Git to make an actual merge commit such as the commit K I drew above. Note that git merge will sometimes do this on its own: there's one special case where it assumes you want a distinct commit for tagging purposes, and that occurs when you give git merge an annotated tag name that, as the git merge documentation puts it:

is not stored in its natural place in the refs/tags/ hierarchy

Git assumes that doing a fast-forward could mess with your tag's distinctiveness, so it internally turns on the --no-ff mode. (I don't understand this use case, but that's what this odd phrasing in the documentation means.)

torek
  • 448,244
  • 59
  • 642
  • 775