First, let's cover this part:
Am I using a good method for maintaining different config files in both branches? (I've just seen this method using .gitattributes which looks much better).
I would not recommend either of these methods, at least in general.
Instead, if you have a configuration file that controls things, don't check it in at all (at least not in this repository—you might check it in to some other repository, and make the file you use here a symbolic link to the "real" configuration file stored in the other repository, for instance). Have as a source-controlled file an example or sample or starter configuration. Have your system copy this file to the real configuration file (that is ignored via .gitignore
) if necessary.
In some cases, you might split configurations into "system configurations" (which might be tracked) and "user configurations" (which generally would not be tracked and might be in a different directory entirely). Compare this with, for instance, .gitattributes
, where you set things like how the files under source control should be treated, vs $HOME/.gitconfig
, where you set things like how commits should record your name and email address. The former really is a property of the source, and the latter is not.
The drawback to using a merge driver in .gitattributes
is that such merge drivers are only run in the case of a "true merge", which ... well, see the long section below.
The long part
... it's possible to merge different parts of the history using tilde followed by a number.
This is true (at least in various senses) but may be misleading. In fact, you can run git merge
with any commit hash ID. When you run git merge branchX
, Git will first turn the name branchX
into a commit hash ID. That commit hash ID is the one that the name branchX
points to:
o <-- branchW
/
...--o--o--o--o <-- branchX
\
o--o <-- branchY
Here each of the round o
nodes represents a commit—an object with a big ugly hash ID—and the branch names simply act as moveable pointers to the commit. To grow a branch like branchW
, we run git checkout branchW
, which "attaches our HEAD" to the branch:
o <-- branchW (HEAD)
/
...--o--o--o--o <-- branchX
\
o--o <-- branchY
and fills in Git's index with the tip commit contents, and likewise fills in our work-tree where we do our work. (The index is where you build the next commit, so it starts out matching the work-tree and the current commit.) We then modify files in the work-tree, where we do our work; then we copy the changed files back into the index, so that the next commit will snapshot the updated versions, rather than re-snapshotting the old versions; and then we run git commit
.
The git commit
command makes a new commit *
whose parent is the current tip of the branch:
o--*
/
...--o--o--o--o <-- branchX
\
o--o <-- branchY
and then writes that new commit's hash ID into the branch name to which HEAD
is attached, so that now branchW
points to *
instead of its parent:
o--* <-- branchW (HEAD)
/
...--o--o--o--o <-- branchX
\
o--o <-- branchY
If you run git merge
and give it a branch name, git merge
locates the tip commit to which the branch name points. If you run git merge
and give it anything that identifies some other commit, git merge
locates the other commit.
What it does after locating the other commit gets a bit complicated. Let's dive into this part instead for now:
Why did merging master into dev and dev back into master cause the history to be overwritten?
It didn't! You are thinking of history and contents as if they are the same thing, but they are quite different.
In Git, the commits are the history. The graph I drew above shows eight points of history—eight commits—once we've added a new commit that's become the new tip of branchW
. (The ...
section represents more history, of course, but that's not history we care about at the moment.)
Each commit stores a (single) snapshot, which is the source as of that point in history. As I mentioned above, the content that goes into this snapshot is whatever is in the index, which Git also calls the staging area and sometimes the cache. It has multiple roles, but the main one is that it's the source for all the files that go into each new commit you make, as you make new commits.
Every time you add a commit, you add more history. Each commit has a backwards link, connecting the commit to its parent. Merge commits—here we use the word merge
as an adjective, or sometimes as a noun: a merge means a merge commit—has two (or more, but don't worry about this here) of these backwards links. The first one tells you about the normal first parent, as always; the second one tells you—and Git—which commits were brought in by the act of merging and hence no longer need to be considered. This last bit is going to be the key to the problem.
It's important to remember here that every commit is a pure snapshot. There is no notion, at this level, of a commit as a change. It's just a snapshot at this level! But most commits have one parent, and if you compare the snapshot in the parent to the snapshot in the child, you get a change.
If you compare two commits, you will see what happened to existing files, whether any files were deleted entirely, and whether any files were created. In other words, Git can, by using history, turn a snapshot into a change-set. But you don't have to compare parent to child. You can, instead, compare some great-great-great-grand-parent to the child, to get a longer-term view. This is where git merge
comes in. I suspect you actually have a fairly good handle on merge-as-a-verb, but we'll come back to that in a bit.
Naming specific commits
As you eventually suspected, the ~
notation counts backwards from zero, not from one. Let's draw a slightly different graph, and give the commits single letter names so that we can talk about them:
...--B--C--D <-- master
\
E--F--G--H--I--J <-- dev
The name dev
identifies commit I
. The name—or in fact, almost anything that identifies a commit—can have a ~
or ^
character appended, followed by a number. This is all documented in the gitrevisions manual page, but in short, tilde followed by a number means "count back that many first-parent links". For non-merge commits, there's only one parent, so it's obvious which link is the first parent too. Hence dev~0
counts back no steps and names commit J
; dev~1
counts back one step and names commit I
; and so on.
Note that if we count back six steps, we name commit B
, which is also on master
. This is a strange feature of Git: commits can be on more than one branch at a time. (Many if not most other version control systems don't behave this way: a commit, once made, is on the branch you made it on, no more and no less.) For this reason, it's sometimes better to think of commits as being "contained within" branches: commit B
is contained within both master
and dev
.
Merging
Let's look now at what git merge
does—but watch out, it's a bit complicated. There are an unfortunately large number of cases here, but let's look at the merge-as-a-verb, to merge, case that results in a merge. We already mentioned the adjective / noun form of merge. At the end of merging, git merge
often makes a merge commit, and this commit, by definition, has two parents: the first parent is the commit that was current (was HEAD
) when you ran git merge
, and the second one is the one you named as an argument to git merge
. But how does this commit come about? Here, we experience the verb form, to merge:
git checkout master; git merge dev
One of the commits is the current commit D
(aka HEAD
, aka master
). The other is the commit J
(aka dev
).
When Git goes to merge these two commits, it must identify a third commit, which we call the merge base. This is where a history like that of master
and dev
comes in, because the merge base is, loosely speaking, the commit where the branches "come together". There may be more than one such commit; in that case, Git takes the one "nearest" the end points. So if we have:
...--B--C--D <-- master (HEAD)
\
E--F--G--H--I--J <-- dev
then B
, and everything before B
, is a possible candidate, but since B
is "closest" to the two branch tip commits D
and I
, Git will always pick B
.
Git now has the necessary three inputs, and can merge D
and J
using B
as the merge base. For the normal case, Git will now run two git diff
comparisons, much as if you ran:
git diff --find-renames B D > /tmp/b-vs-d.patch # what we did
git diff --find-renames B J > /tmp/b-vs-j.patch # what they did
Git then combines the two sets of changes. Git declares a conflict if any change we made on "our side" (b-vs-d) touches the same line(s) of the same file(s) as any change they made on their side (b-vs-j), except for the case where we made the exact same change that they made. If we both made the same change, Git just takes one copy of that change.
If all goes well—usually it does—there are no conflicts and Git builds up a work-tree and index that consists of "everything from B, changed according to everything we changed, and changed according to everything they changed". Git is now able to make the new merge commit, so let's draw that in:
...--B--C--D------------K <-- master (HEAD)
\ /
E--F--G--H--I--J <-- dev
Commit K
has two parents, D
and J
, and that's a normal merge.
But you didn't do that; let's un-draw it, and go back to the original. Instead, you ran:
git merge dev~4
so let's draw the effect of that, knowing that dev~4
moves back across four first-parent links, from J
to F
:
...--B--C--D--K <-- master (HEAD)
\ /
E----F--G--H--I--J <-- dev
Git combined the B
-to-D
changes on master with the B
-to-F
changes on dev
, and made new merge commit K
.
Then you ran:
git merge -s ours dev~3
This -s ours
modifies the verb form of to merge, without changing the noun form. The verb-form change includes the fact that Git doesn't bother computing the merge base at all (it doesn't have to). If Git did compute the merge base, though, what would it be? To find out, start at K
and work backwards along all possible paths, and start at G
(dev~3
) and work backwards too.
From K
we move back to both D
and F
. From G
we move back to F
. Commit F
is on both branches and is closest to both branch tips, so F
is the merge base. The -s ours
merge then ignores F
entirely, takes the tree (the snapshot) from K
, and makes a new merge commit with two parents as usual:
...--B--C--D--K--L <-- master (HEAD)
\ / /
E----F--G--H--I--J <-- dev
Now you ran:
git merge dev
Once again, we start by finding the merge base (of L
and J
). Work backwards as needed: the parents of L
are K
and G
, and the parent of J
is I
whose parent is H
whose parent is G
.
This means the merge base is G
, and Git will now compute two diffs:
git diff --find-renames G L > /tmp/g-vs-l.patch
git diff --find-renames G J > /tmp/g-vs-j.patch
Git now starts with the tree/snapshot from G
, applies our changes from g-vs-l
, applies their changes from g-vs-j
, and hits a merge conflict. At this point Git just stops with the partial merge recorded (along with the conflicts) in the index and work-tree.
[I] resolved the conflict in my mergetool.
This finishes the merge-as-a-verb process, resolving the index and work-tree conflicted files and adding all the final versions to the index. You can now run git merge --continue
or git commit
to finish the merge to get:
...--B--C--D--K--L--------M <-- master (HEAD)
\ / / /
E----F--G--H--I--J <-- dev
The tree (snapshot) for M
, the last merge on master
, is the one you constructed when you resolved the conflicts.
Let's see what happens now
Having made a commit on master, I now needed to get the histories back in sync between the two branches. I thought I should skip the commit I'd made on master by merging on dev using the ours strategy again:
git checkout dev
git merge master -s ours
Let's see what this does. The git checkout dev
step replaces the index and work-tree contents with the tip commit of dev
, and attaches HEAD
to dev
, giving:
...--B--C--D--K--L--------M <-- master
\ / / /
E----F--G--H--I--J <-- dev (HEAD)
The only thing we see in the diagram is the movement of HEAD
, but the index and work-tree are of course changed as well.
The git merge
step is now a bit special. As before, you are using -s ours
to invoke the "ours" strategy. This completely ignores the merge base; it just makes a new commit, re-using the current index contents, with two parents, so let's draw that:
...--B--C--D--K--L--------M <-- master
\ / / / \
E----F--G--H--I--J---N <-- dev (HEAD)
The tree for N
matches the tree for J
; the first parent of N
(N~1
) is J
, and the second parent of N
is M
.
Days later I committed more changes on the dev branch ...
OK, let's draw some more dev
commits:
...--B--C--D--K--L--------M <-- master
\ / / / \
E----F--G--H--I--J---N--O--P <-- dev (HEAD)
and merged into master:
git checkout master
git merge dev
The first step moves HEAD
to master
, and changes out our index and work-tree contents to match M
.
The second step finds the merge base of the current commit M
and the named commit P
. This is the first commit we can find that is on both branches. This time, start at P
and work backwards. Its parent is O
; O
's parent is N
; and N
has two parents, **including M
.
At this point, since we are doing a normal (not -s ours
) merge, Git does something peculiar: it doesn't bother with either kind of merge at all. It skips the merge-as-a-verb steps, and the merge-as-a-noun steps. Instead, it just immediately makes the index and work-tree match the other commit and makes the current branch name point to the other commit:
...--B--C--D--K--L--------M
\ / / / \
E----F--G--H--I--J---N--O--P <-- master (HEAD), dev
Both branch names now point to the same commit! Commit P
is now the tip commit of both branches; both branches contain all the same commits.
There is a way we could have forced Git to make a merge commit anyway, using git merge --no-ff
. Let's see what happens if we had used that instead. Git would find the merge base M
, compare M
vs M
(no changes), compare M
vs P
(their changes), and build a new commit using their changes:
...--B--C--D--K--L--------M---------Q <-- master (HEAD)
\ / / / \ /
E----F--G--H--I--J---N--O--P <-- dev
The tree for Q
would match the tree for P
. In other words, because the merge base is M
itself, we have set ourselves up so that any merge of dev
amounts to taking the final commit of dev
as our snapshot, whether we do that directly (as a fast-forward non-merge setting master
to point to commit P
) or by forcing a real merge commit Q
that just re-uses the tree from P
.
Conclusion
We now have fairly definite answers to some of the questions, and guidelines for the others:
1) Why did merging master into dev and dev back into master cause the history to be overwritten?
It didn't: it just added new history. The problem is that the new history set you up with a hazard. This is always a potential problem; all merges need some kind of testing and/or inspection because Git is just following a bunch of simple textual rules for combining change-sets.
2) What should I have done instead to preserve the different histories?
Nothing, really, except perhaps use the --no-ff
flag to force the merge operation to make a merge commit. Fundamentally, the problem is that you're treating the configuration as if it's a file that corresponds to each snapshot and needs to be merged like any other file in any snapshot. Sometimes that's true, and sometimes it is not. As long as the file is in the tree like this, you must inspect the result of each merge.
3) I now think the tilde numbering on my above example should (have been different by one)
Correct.
4) Am I using a good method for maintaining different config files in both branches?
This part is difficult to say. But, if you use a merge=ours
merge driver, note that it will fire only when the hash ID of the file in all three input commits differs. That is, the contents in the merge base must differ from the contents in the HEAD
commit, and both must differ from the contents in the other commit. Of course, if the merge base is the same commit as either input commit, all the files in the merge base necessarily match all the files in at least one of the tip commits. If the two tips have the same file contents, you're OK anyway. So this really goes wrong when:
- the merge base isn't the same as the
HEAD
commit, but
- the file in the merge base is the same as the file in the
HEAD
commit, in which case
- Git can (and does) just take the file from the other commit without running your merge driver. If that's different from the file in the
HEAD
commit, you have picked up their changes!
(To see how this works, try running:
git rev-parse HEAD:Makefile
if you have a file named Makefile
in your commit, for instance. Every file in every commit has a hash ID. The hash ID depends on the file's contents; if the file is the same in two different commits, those two different commits use the same blob hash in their trees, so that they share the single copy of the underlying file.)