Git merging strategy

Question

I have a branch created in git called release-1.0.0 to which I keep commiting the code. Now, there is a major future release which has significant changes in the design and architecture change called release-2.0.0. This new branch is created from release-1.0.0. This will have take some changes from release-1.0.0, but cannot incorporate certain changes from that branch due to the design difference.

What is the right strategy to move the changes that are done in release-1.0.0 to release-2.0.0 ? Is merge the right thing to do? Or should it be manually copy-paste the code to release-2.0.0? Or should we even have to create a separate repository for this :-O

Finally release-1.0.0 and release-2.0.0 will be merged to the master once they are completed. Kindly share your thoughts. I'm not sure, if this is the right question to be asked. But I had seen other similar questions asked here

Seems like you want to checkout `release-2.0.0` and perform `git cherry-pick` on select commits — Omer Tuchfeld, Jul 31 '20 at 10:35
BTW it seems weird that `release_2.0.0` does not incorporate all changes from `release_1.0.0` — jannis, Jul 31 '20 at 10:36
If there is a different architecture and design style used, there can be scenarios where we can't use them as is right? — Apps, Jul 31 '20 at 11:31
how can we answer the question "what is the right strategy" when we don't know the structure of the two branches? — Daemon Painter, Jul 31 '20 at 11:35
@Daemon Painter, could you please let me know what you mean by the structure of the branches? Release 2 branch was originally created from release 1 itself — Apps, Jul 31 '20 at 12:58
Checkout out to release 2 and then pull the code from release 1 and then resolve the merge conflicts? I had a similar case when there were design changes in release branch and there were minor changes in the patch branch which had to be merged in the release branch. Since the changes in the patch branch were in files deleted in release branch, I ignored the changes in patch branch while merging it to release branch and had to do a separate commit (changes in patch) in release branch — Gautham M, Jul 31 '20 at 13:16

score 3 · Answer 1 · answered Jul 31 '20 at 22:13

If there were a single right answer to this kind of question, everyone would use it and it would be well-known. There isn't. But there are some general things we can say:

Or should we even have to create a separate repository for this :-O

A separate repository is nothing more than a branch—one whose contents nobody else can see, unless they have access to that separate repository. (Well, technically, it's a whole set of branches in Git, sine branch names are local to the repository.) Creating branches, in Git, is very low cost, so if that helps you, that's a fine thing to do, whether you put it in a separate repository or not.

What we can say for sure is this:

Git is, at heart, really all about commits.
Each commit is numbered. The numbers are not simple sequential numbers—they don't count up, 1, 2, 3, and so on—and are instead random-looking hash IDs, but they're still uniquely numbered.
The computation of the hash ID is crucial to making Git work: the secret here is that every Git everywhere will compute the same hash ID for the same commit content. So this means that two Gits, when they talk to each other, need only compare hash IDs to see if they have the same commits. (You don't need to care about this for your immediate problem, it's just a useful thing to know.)
The contents of a commit come in two parts:
- Each commit has a full snapshot of every file. These files are in a special, read-only, Git-only, compressed and de-duplicated form, that in general only Git can read. (The de-duplication means that since most commits mostly re-use the files from some earlier commit, a new commit hardly takes any extra space. Even though each commit has a full copy of each file, these commits actually share the single copy.)
- Along with the snapshot, each commit has some metadata, or information about the commit itself. The metadata include the name and email address of the person who made the commit, some date-and-time-stamps, and their log message for why they made that commit. There's one part of this metadata that's exclusively for Git itself, that Git maintains for itself: each commit records the hash ID—the commit number—of its parent (or, for a merge, parents, plural).

This last part is how and why branch names like master store only one thing: the hash ID of the last commit. It's the commits themselves that are, and store, the history of the project.

Note that commits do not store changes. They store snapshots. But because each commit remembers its immediate previous commit—its parent—Git can take any commit and walk back one step and look at its parent. In the parent, most of the files are probably the same, and literally shared via the de-duplication. Git can therefore skip right over those files and only bother comparing the files that are different between the two commits. By comparing the differing files, Git can compute, when you ask it to, what changed in those files, and hence show you what changed in that commit.

Cherry-pick vs merge

To take a change from a single commit, you can use git cherry-pick. Internally, this actually uses Git's merge machinery, but a simplified description makes it all make sense:

Git compares the commit against its parent to see what changed in the commit to be cherry-picked.
Then, Git applies the same changes to the current commit.

If the application goes smoothly, you've just made the the same change, and Git will make a commit on its own. The person making the new commit is of course you, just now, but the message gets copied from the original commit too. The diff from the new commit to its parent will be the same as the diff from the cherry-picked commit to its parent. But the new commit isn't quite the same as the original,¹ so it has a different hash ID (commit number).

This is very different from merging. When you use git merge, you tell Git: *Find the best shared ancestor of two particular commits. Compare that shared ancestor commit to each of the two branch tips. As an illustration, consider the following relatively simple branch history:

          I--J   <-- branch1 (HEAD)
         /
...--G--H
         \
          K--L   <-- branch2

Here, we're "on" branch branch1, as indicated by the attached special name HEAD (HEAD). We run git merge branch2, to tell Git that the two commits are commit J—our current commit—and commit L. Git finds the best shared commit H on its own. Git calls this the merge base. Git then compares H-vs-J to see what we changed, which picks up changes made in both commits I and J, and compares H-vs-L to see what they changed, which picks up changes made in commits K and L.

The merge process combines the two sets of changes, applying the combined changes to the snapshot from commit H, i.e., the merge base. The resulting combined changes, if all goes well, apply correctly and Git produces the new merge commit M on its own:

          I--J
         /    \
...--G--H      M   <-- branch1 (HEAD)
         \    /
          K--L   <-- branch2

Because we are on branch1, Git writes the new merge commit's hash ID into the name branch1, automatically updating that name so that the last commit on branch branch1 is now M. Because M has two parents, instead of just one, this ties everything together. If we make more commits on branch2, then go back to branch1, like this:

          I--J
         /    \
...--G--H      M   <-- branch1 (HEAD)
         \    /
          K--L----N--O   <-- branch2

and ask Git to merge again, this time the best shared commit is not H but rather is L (commit L is on both branches). So this time Git will compare L and M to see what we changed—that's the changes we carried in because of H-vs-J, after all—and then compare L-vs-O to see what they changed on branch2. Git will combine those changes, apply those to the snapshot in L, and produce a new merge:

          I--J
         /    \
...--G--H      M-------P   <-- branch1 (HEAD)
         \    /       /
          K--L----N--O   <-- branch2

and now commit P will have picked up the changes from N-O, and a future merge will use commit O as the new merge base.

If we go back and compare this to cherry-picking, we see how they are quite different:

          I--J   <-- branch1 (HEAD)
         /
...--G--H
         \
          K--L   <-- branch2

Say we now run git cherry-pick on commit L, by giving its hash ID or using the name branch2. Git will compare commit K's snapshot to commit L's snapshot, apply those changes to commit J, and make a new commit we'll call L'—indicating that it's a copy of L—that will have commit J as its (single) parent:

          I--J--L'  <-- branch1 (HEAD)
         /
...--G--H
         \
          K--L   <-- branch2

We did not get any of the changes from commit K.

If we run git merge branch2 at this point, Git will still find H as the merge base, and will compare H vs L' to see what we changed, and H vs K to see what they changed, as before. This time, when Git goes to combine these changes, we'll already have the K-vs-L changes, but Git is usually smart enough to just say oh, I see we and they both did the same thing, so I'll just take one copy of the changes.

¹The differences include the fact that the committer timestamp is "right now", whereas the committer timestamp of the commit you're copying is presumably some time in the past. But there's this as well: the parent of the new commit is the commit that used to be the last commit on the branch you're cherry-picking into. The parent of the commit you're cherry-picking is different. So even if you manage to cherry-pick during the exact same second you make the original commit, the new commit will be at least slightly different, and that produces a totally different hash ID.

Major structural changes make things hard for Git

there is a major future release which has significant changes in the design and architecture ...

To see how this becomes a problem, sit down and draw yourself a simplified graph:

          D--X--E--F   <-- redesigned
         /
...--B--C
         \
          G--H--Y--I   <-- somebranch

Suppose commits X and Y make fairly radical changes. Then commit C, which is on both branches, is exactly the same on both branches because it's really just the one commit C. You obviously don't want commits X or Y copied to the other branch—these are the major redesigns—so you definitely don't want to merge commits F and I in any way.

You can git cherry-pick commit G or H to redesigned pretty easily, because those commits are applied to commit C itself, or to something derived directly from C. You can git cherry-pick commit D to somebranch, because that commit is applied to C itself. But if you try to cherry-pick E, F, or I, well, those are after the major redesign commits. They're not likely to apply as easily.

Work that Git can't do becomes work that you must do

If what's in commits E, F, and/or I never has to move across to the "other branch", that's fine. But if there's something you did in E or F that's important to I, well, now you have a problem.

There's no royal road here, but note this. Suppose you have a fix for a problem that occurs in a commit that comes before any of the major-change commits:

          D--X--E--F   <-- redesigned
         /
...--B--C
         \
          G--H--Y--I   <-- somebranch

Suppose there is a flaw in commits B, C, D, G, and/or H. Suppose further that we can fix the flaw by making a branch at the point where the flaw appears. For simplicity, let's make a fix123 branch pointing to commit C now, using git checkout -b fix123 hash-of-C:

          D--X--E--F   <-- redesigned
         /
...--B--C   <-- fix123 (HEAD)
         \
          G--H--Y--I   <-- somebranch

Now let's fix a bug that appears in commit C, which is shared with both branches, by making new commit J:

          D--X--E--F   <-- redesigned
         /
...--B--C--J   <-- fix123 (HEAD)
         \
          G--H--Y--I   <-- somebranch

This gives us the ability to run git checkout redesigned; git merge fix123 and git checkout somebranch; git merge fix123 to incorporate the fix into both branches. Having done so, we end up with this:

          D--X--E--F--K   <-- redesigned
         /           /
...--B--C-----------J   <-- fix123
         \           \
          G--H--Y--I--L   <-- somebranch

where K and L are merge commits. This lets us see that the fix has been applied to both branches. The problematic commits X and Y are still only on branches redesigned and somebranch. Share commit C, however, is followed by shared commit J.

Should G and, perhaps, a fix, need to go into redesigned, we can make a new branch pointing directly at commit G, make a fix for that, and then merge that into redesigned. The resulting graph is too tangled for me to attempt to draw here, but everything will be recorded in Git, ready to be extracted later.

Each of these merges may present some difficulty (because of the structural rewrite commits), and it's often tempting to just use separate fixes in each branch-tip. There's nothing inherently wrong with that, either, especially if you don't know in advance that the fix might be needed on both branches.

Git merging strategy

1 Answers1

Cherry-pick vs merge

Major structural changes make things hard for Git

Work that Git can't do becomes work that you must do