0

I recently had a nasty merging problem and was wondering if anyone could help me understand it, so it doesn't happen again.

In the beginning

In my Git repo, master branch is the production environment and I want to use dev branch as the staging environment.

My plan was to maintain a config file in different states on master and dev:

  • On master the config contains connection settings for live third-party APIs etc.
  • On dev the config contains settings for sandboxed versions of all the APIs etc.

I created dev by branching from master (1), committing changes to the config (2) and then merging into master using 'ours' strategy (4), so that the config would remain in different states on each branch but they would be at the same point in their histories (so future changes on dev can be merged to master without altering the config):

1  git checkout -b dev
2  git commit -am "Dev config settings"
3  git checkout master
4  git merge -s ours dev

The problem started

I was OK until I got to the point where I needed to edit the config file again. I'd made several changes on dev, something like this:

commit_1
commit_2
commit_3
commit_to_config_file
commit_5
commit_6

...and searched around to find it's possible to merge different parts of the history using tilde followed by a number.

However, I realise now I'd misunderstood this info. I thought the tilde numbering worked like this:

commit_1 # dev~6
commit_2 # dev~5
commit_3 # dev~4
commit_to_config_file # dev~3
commit_5 # dev~2
commit_6 # dev~1 = most recent commit

So I did this on master:

git merge dev~4
git merge -s ours dev~3
git merge dev

...and hit a merge conflict.

I didn't realise my numbering mistake at this point, so resolved the conflict in my mergetool. Afterwards I found there were errors on master and thought the easiest way to deal with them would be to commit corrections directly. Everything now worked fine on production.


Sealing my doom

Having made a commit on master, I now needed to get the histories back in sync between the two branches. I thought I should skip the commit I'd made on master by merging on dev using the ours strategy again:

git checkout dev
git merge master -s ours

Days later I committed more changes on the dev branch and merged into master:

git checkout master
git merge dev

Shortly afterwards I found the production environment was (disastrously) using the sandbox config settings! I had overwritten the master branch history with the dev history. It now said I'd committed the sandbox settings to the config file about 14 days ago on master - which I'd missed because I didn't check that far back in the history before pushing my changes to origin.


Questions at last

1) Why did merging master into dev and dev back into master cause the history to be overwritten?

2) What should I have done instead to preserve the different histories?

3) I now think the tilde numbering on my above example should actually have looked like this:

commit_1 # dev~5
commit_2 # dev~4
commit_3 # dev~3
commit_to_config_file # dev~2
commit_5 # dev~1
commit_6 # dev = most recent commit

... is that correct?

4) Am I using a good method for maintaining different config files in both branches? (I've just seen this method using .gitattributes which looks much better).

Karl Bishop
  • 391
  • 1
  • 14

2 Answers2

0

the config file was mainly overwrited for master branch when the first time to merge dev branch into master branch by fast-forward merge (recursive merge strategy).

For the work flow, when you created dev from master branch and changed config file on dev branch, assume the commit history as blow:

...---A---B  master
           \
            C  dev

After merging dev into master, the commit history will be:

...---A---B---C  master,dev

So the config file on master branch is automatically change to the same version with dev branch (our merge strategy actually not work).

BTW: since you need to keep the config file differently for each branch, you can also ignore the config file for full branches.

Marina Liu
  • 36,876
  • 5
  • 61
  • 74
  • Thanks for answer. I still don't understand, sorry... ours merge worked OK when I merged dev into master in the beginning. I merged dev into master several times before I started having problems and the config file was correct in both branches (they maintained different versions). It was just the final merges of master into dev and then dev into master that overwrote all the master branch history. So I think the problem is something to do with merging branches in both directions, instead of always one way...? Maybe I should have used fast-forward on dev after new commit on master? – Karl Bishop Feb 13 '18 at 22:22
0

First, let's cover this part:

Am I using a good method for maintaining different config files in both branches? (I've just seen this method using .gitattributes which looks much better).

I would not recommend either of these methods, at least in general.

Instead, if you have a configuration file that controls things, don't check it in at all (at least not in this repository—you might check it in to some other repository, and make the file you use here a symbolic link to the "real" configuration file stored in the other repository, for instance). Have as a source-controlled file an example or sample or starter configuration. Have your system copy this file to the real configuration file (that is ignored via .gitignore) if necessary.

In some cases, you might split configurations into "system configurations" (which might be tracked) and "user configurations" (which generally would not be tracked and might be in a different directory entirely). Compare this with, for instance, .gitattributes, where you set things like how the files under source control should be treated, vs $HOME/.gitconfig, where you set things like how commits should record your name and email address. The former really is a property of the source, and the latter is not.

The drawback to using a merge driver in .gitattributes is that such merge drivers are only run in the case of a "true merge", which ... well, see the long section below.

The long part

... it's possible to merge different parts of the history using tilde followed by a number.

This is true (at least in various senses) but may be misleading. In fact, you can run git merge with any commit hash ID. When you run git merge branchX, Git will first turn the name branchX into a commit hash ID. That commit hash ID is the one that the name branchX points to:

             o   <-- branchW
            /
...--o--o--o--o   <-- branchX
         \
          o--o   <-- branchY

Here each of the round o nodes represents a commit—an object with a big ugly hash ID—and the branch names simply act as moveable pointers to the commit. To grow a branch like branchW, we run git checkout branchW, which "attaches our HEAD" to the branch:

             o   <-- branchW (HEAD)
            /
...--o--o--o--o   <-- branchX
         \
          o--o   <-- branchY

and fills in Git's index with the tip commit contents, and likewise fills in our work-tree where we do our work. (The index is where you build the next commit, so it starts out matching the work-tree and the current commit.) We then modify files in the work-tree, where we do our work; then we copy the changed files back into the index, so that the next commit will snapshot the updated versions, rather than re-snapshotting the old versions; and then we run git commit.

The git commit command makes a new commit * whose parent is the current tip of the branch:

             o--*
            /
...--o--o--o--o   <-- branchX
         \
          o--o   <-- branchY

and then writes that new commit's hash ID into the branch name to which HEAD is attached, so that now branchW points to * instead of its parent:

             o--*   <-- branchW (HEAD)
            /
...--o--o--o--o   <-- branchX
         \
          o--o   <-- branchY

If you run git merge and give it a branch name, git merge locates the tip commit to which the branch name points. If you run git merge and give it anything that identifies some other commit, git merge locates the other commit.

What it does after locating the other commit gets a bit complicated. Let's dive into this part instead for now:

Why did merging master into dev and dev back into master cause the history to be overwritten?

It didn't! You are thinking of history and contents as if they are the same thing, but they are quite different.

In Git, the commits are the history. The graph I drew above shows eight points of history—eight commits—once we've added a new commit that's become the new tip of branchW. (The ... section represents more history, of course, but that's not history we care about at the moment.)

Each commit stores a (single) snapshot, which is the source as of that point in history. As I mentioned above, the content that goes into this snapshot is whatever is in the index, which Git also calls the staging area and sometimes the cache. It has multiple roles, but the main one is that it's the source for all the files that go into each new commit you make, as you make new commits.

Every time you add a commit, you add more history. Each commit has a backwards link, connecting the commit to its parent. Merge commits—here we use the word merge as an adjective, or sometimes as a noun: a merge means a merge commit—has two (or more, but don't worry about this here) of these backwards links. The first one tells you about the normal first parent, as always; the second one tells you—and Git—which commits were brought in by the act of merging and hence no longer need to be considered. This last bit is going to be the key to the problem.

It's important to remember here that every commit is a pure snapshot. There is no notion, at this level, of a commit as a change. It's just a snapshot at this level! But most commits have one parent, and if you compare the snapshot in the parent to the snapshot in the child, you get a change.

If you compare two commits, you will see what happened to existing files, whether any files were deleted entirely, and whether any files were created. In other words, Git can, by using history, turn a snapshot into a change-set. But you don't have to compare parent to child. You can, instead, compare some great-great-great-grand-parent to the child, to get a longer-term view. This is where git merge comes in. I suspect you actually have a fairly good handle on merge-as-a-verb, but we'll come back to that in a bit.

Naming specific commits

As you eventually suspected, the ~ notation counts backwards from zero, not from one. Let's draw a slightly different graph, and give the commits single letter names so that we can talk about them:

...--B--C--D   <-- master
      \
       E--F--G--H--I--J  <-- dev

The name dev identifies commit I. The name—or in fact, almost anything that identifies a commit—can have a ~ or ^ character appended, followed by a number. This is all documented in the gitrevisions manual page, but in short, tilde followed by a number means "count back that many first-parent links". For non-merge commits, there's only one parent, so it's obvious which link is the first parent too. Hence dev~0 counts back no steps and names commit J; dev~1 counts back one step and names commit I; and so on.

Note that if we count back six steps, we name commit B, which is also on master. This is a strange feature of Git: commits can be on more than one branch at a time. (Many if not most other version control systems don't behave this way: a commit, once made, is on the branch you made it on, no more and no less.) For this reason, it's sometimes better to think of commits as being "contained within" branches: commit B is contained within both master and dev.

Merging

Let's look now at what git merge does—but watch out, it's a bit complicated. There are an unfortunately large number of cases here, but let's look at the merge-as-a-verb, to merge, case that results in a merge. We already mentioned the adjective / noun form of merge. At the end of merging, git merge often makes a merge commit, and this commit, by definition, has two parents: the first parent is the commit that was current (was HEAD) when you ran git merge, and the second one is the one you named as an argument to git merge. But how does this commit come about? Here, we experience the verb form, to merge:

git checkout master; git merge dev

One of the commits is the current commit D (aka HEAD, aka master). The other is the commit J (aka dev).

When Git goes to merge these two commits, it must identify a third commit, which we call the merge base. This is where a history like that of master and dev comes in, because the merge base is, loosely speaking, the commit where the branches "come together". There may be more than one such commit; in that case, Git takes the one "nearest" the end points. So if we have:

...--B--C--D   <-- master (HEAD)
      \
       E--F--G--H--I--J  <-- dev

then B, and everything before B, is a possible candidate, but since B is "closest" to the two branch tip commits D and I, Git will always pick B.

Git now has the necessary three inputs, and can merge D and J using B as the merge base. For the normal case, Git will now run two git diff comparisons, much as if you ran:

git diff --find-renames B D > /tmp/b-vs-d.patch  # what we did
git diff --find-renames B J > /tmp/b-vs-j.patch  # what they did

Git then combines the two sets of changes. Git declares a conflict if any change we made on "our side" (b-vs-d) touches the same line(s) of the same file(s) as any change they made on their side (b-vs-j), except for the case where we made the exact same change that they made. If we both made the same change, Git just takes one copy of that change.

If all goes well—usually it does—there are no conflicts and Git builds up a work-tree and index that consists of "everything from B, changed according to everything we changed, and changed according to everything they changed". Git is now able to make the new merge commit, so let's draw that in:

...--B--C--D------------K   <-- master (HEAD)
      \                /
       E--F--G--H--I--J  <-- dev

Commit K has two parents, D and J, and that's a normal merge.

But you didn't do that; let's un-draw it, and go back to the original. Instead, you ran:

git merge dev~4

so let's draw the effect of that, knowing that dev~4 moves back across four first-parent links, from J to F:

...--B--C--D--K   <-- master (HEAD)
      \      /
       E----F--G--H--I--J  <-- dev

Git combined the B-to-D changes on master with the B-to-F changes on dev, and made new merge commit K.

Then you ran:

git merge -s ours dev~3

This -s ours modifies the verb form of to merge, without changing the noun form. The verb-form change includes the fact that Git doesn't bother computing the merge base at all (it doesn't have to). If Git did compute the merge base, though, what would it be? To find out, start at K and work backwards along all possible paths, and start at G (dev~3) and work backwards too.

From K we move back to both D and F. From G we move back to F. Commit F is on both branches and is closest to both branch tips, so F is the merge base. The -s ours merge then ignores F entirely, takes the tree (the snapshot) from K, and makes a new merge commit with two parents as usual:

...--B--C--D--K--L   <-- master (HEAD)
      \      /  /
       E----F--G--H--I--J  <-- dev

Now you ran:

git merge dev

Once again, we start by finding the merge base (of L and J). Work backwards as needed: the parents of L are K and G, and the parent of J is I whose parent is H whose parent is G.

This means the merge base is G, and Git will now compute two diffs:

git diff --find-renames G L > /tmp/g-vs-l.patch
git diff --find-renames G J > /tmp/g-vs-j.patch

Git now starts with the tree/snapshot from G, applies our changes from g-vs-l, applies their changes from g-vs-j, and hits a merge conflict. At this point Git just stops with the partial merge recorded (along with the conflicts) in the index and work-tree.

[I] resolved the conflict in my mergetool.

This finishes the merge-as-a-verb process, resolving the index and work-tree conflicted files and adding all the final versions to the index. You can now run git merge --continue or git commit to finish the merge to get:

...--B--C--D--K--L--------M   <-- master (HEAD)
      \      /  /        /
       E----F--G--H--I--J  <-- dev

The tree (snapshot) for M, the last merge on master, is the one you constructed when you resolved the conflicts.

Let's see what happens now

Having made a commit on master, I now needed to get the histories back in sync between the two branches. I thought I should skip the commit I'd made on master by merging on dev using the ours strategy again:

git checkout dev
git merge master -s ours

Let's see what this does. The git checkout dev step replaces the index and work-tree contents with the tip commit of dev, and attaches HEAD to dev, giving:

...--B--C--D--K--L--------M   <-- master
      \      /  /        /
       E----F--G--H--I--J  <-- dev (HEAD)

The only thing we see in the diagram is the movement of HEAD, but the index and work-tree are of course changed as well.

The git merge step is now a bit special. As before, you are using -s ours to invoke the "ours" strategy. This completely ignores the merge base; it just makes a new commit, re-using the current index contents, with two parents, so let's draw that:

...--B--C--D--K--L--------M   <-- master
      \      /  /        / \
       E----F--G--H--I--J---N  <-- dev (HEAD)

The tree for N matches the tree for J; the first parent of N (N~1) is J, and the second parent of N is M.

Days later I committed more changes on the dev branch ...

OK, let's draw some more dev commits:

...--B--C--D--K--L--------M   <-- master
      \      /  /        / \
       E----F--G--H--I--J---N--O--P  <-- dev (HEAD)

and merged into master:

git checkout master
git merge dev

The first step moves HEAD to master, and changes out our index and work-tree contents to match M.

The second step finds the merge base of the current commit M and the named commit P. This is the first commit we can find that is on both branches. This time, start at P and work backwards. Its parent is O; O's parent is N; and N has two parents, **including M.

At this point, since we are doing a normal (not -s ours) merge, Git does something peculiar: it doesn't bother with either kind of merge at all. It skips the merge-as-a-verb steps, and the merge-as-a-noun steps. Instead, it just immediately makes the index and work-tree match the other commit and makes the current branch name point to the other commit:

...--B--C--D--K--L--------M
      \      /  /        / \
       E----F--G--H--I--J---N--O--P  <-- master (HEAD), dev

Both branch names now point to the same commit! Commit P is now the tip commit of both branches; both branches contain all the same commits.

There is a way we could have forced Git to make a merge commit anyway, using git merge --no-ff. Let's see what happens if we had used that instead. Git would find the merge base M, compare M vs M (no changes), compare M vs P (their changes), and build a new commit using their changes:

...--B--C--D--K--L--------M---------Q   <-- master (HEAD)
      \      /  /        / \       /
       E----F--G--H--I--J---N--O--P  <-- dev

The tree for Q would match the tree for P. In other words, because the merge base is M itself, we have set ourselves up so that any merge of dev amounts to taking the final commit of dev as our snapshot, whether we do that directly (as a fast-forward non-merge setting master to point to commit P) or by forcing a real merge commit Q that just re-uses the tree from P.

Conclusion

We now have fairly definite answers to some of the questions, and guidelines for the others:

1) Why did merging master into dev and dev back into master cause the history to be overwritten?

It didn't: it just added new history. The problem is that the new history set you up with a hazard. This is always a potential problem; all merges need some kind of testing and/or inspection because Git is just following a bunch of simple textual rules for combining change-sets.

2) What should I have done instead to preserve the different histories?

Nothing, really, except perhaps use the --no-ff flag to force the merge operation to make a merge commit. Fundamentally, the problem is that you're treating the configuration as if it's a file that corresponds to each snapshot and needs to be merged like any other file in any snapshot. Sometimes that's true, and sometimes it is not. As long as the file is in the tree like this, you must inspect the result of each merge.

3) I now think the tilde numbering on my above example should (have been different by one)

Correct.

4) Am I using a good method for maintaining different config files in both branches?

This part is difficult to say. But, if you use a merge=ours merge driver, note that it will fire only when the hash ID of the file in all three input commits differs. That is, the contents in the merge base must differ from the contents in the HEAD commit, and both must differ from the contents in the other commit. Of course, if the merge base is the same commit as either input commit, all the files in the merge base necessarily match all the files in at least one of the tip commits. If the two tips have the same file contents, you're OK anyway. So this really goes wrong when:

  • the merge base isn't the same as the HEAD commit, but
  • the file in the merge base is the same as the file in the HEAD commit, in which case
  • Git can (and does) just take the file from the other commit without running your merge driver. If that's different from the file in the HEAD commit, you have picked up their changes!

(To see how this works, try running:

git rev-parse HEAD:Makefile

if you have a file named Makefile in your commit, for instance. Every file in every commit has a hash ID. The hash ID depends on the file's contents; if the file is the same in two different commits, those two different commits use the same blob hash in their trees, so that they share the single copy of the underlying file.)

torek
  • 448,244
  • 59
  • 642
  • 775
  • Wow, thank you for this phenomenal answer! I'm just about understanding how I ended up in my sticky situation now (few more re-reads should do it); and certainly have a much better appreciation for the way Git works. Seems I also had a confused notion of what .gitignore does - I thought if I added the config file to .gitignore it would essentially be the same as deleting it from Github (which would break the site because it uses the online repo for continuous deployment)... However, I just tested on the dev site and the file is happily intact so looks like this could be the perfect solution – Karl Bishop Feb 14 '18 at 23:08