This is quite long, so feel free to skip over sections you already know (or scroll all the way to the end). Each section has setup information to explain what's going on, or what we are doing, in later ones.
Introduction-y bits
Let me start by re-drawing this graph (which I think is sort of a partial graph, but it contains the key commits we need) the way I prefer:
S0--sc1---sc2---sc3-----sc4----M4---R1---M5---sc5 <-- branch-S
\ \ / /
T0-------------o----M2---M3--------R2 <---- branch-T1
\ \ /
F0--fc1---fc2---M1 <------------------- branch-F
Here, the branch names are branch-S
, branch-T1
, and branch-F
, and these names currently identify commits whose hash IDs are something unpronounceable and impossible for humans to remember, but we are calling sc5
, R2
, and M1
respectively. Any o
nodes are commits that are not especially distinguished in any way, and may actually represent some arbitrary number of commits. The named fc<number>
s are some set of commits on the feature branch, with the M<number>
commits being merges. I renamed the first commits S0
, T0
, and F0
just to tell them apart from the branch names.
Some merges are made manually:
$ git checkout <branch-name>
$ git merge [options] <other-branch>
... fix up conflicts if necessary, and git commit (or git merge --continue)
Other merges are made by software and happen only if there are no conflicts. The R
commits are from running:
git checkout <branch>
git revert -m 1 <hash ID of some M commit>
where <branch>
was either T1
or S
, and -m 1
is because you always have to tell git revert
which parent to use when reverting a merge, and it's almost always parent #1.
Making commits moves a branch name
The simplest Git commit graph is a straight line, with one branch name, typically master
:
A--B--C <-- master (HEAD)
Here, we need to mention Git's index. The index is perhaps best described as the place where Git builds the next commit to make. It initially contains every file as saved in the current commit (here C
): you check out this commit, populating the index and work-tree with the files from commit C
. The name master
points to this commit, and the name HEAD
is attached to the name master
.
You then modify files in the work-tree, use git add
to copy them back into the index, use git add
to copy new files into the index if needed, and run git commit
. Making a new commit works by freezing these index copies into a snapshot. Git then adds the snapshot metadata—your name and email, your log message, and so on—along with the current commit's hash ID, so that the new commit points back to the existing commit. The result is:
A--B--C <-- master (HEAD)
\
D
with the new commit, with its new unique hash ID, just hanging out in midair, with nothing to remember it. So, the last step of making a new commit is to write the new commit's hash ID into the branch name:
A--B--C--D <-- master (HEAD)
and now the current commit is D
, and the index and the current commit match. If you git add
-ed all the files in the work-tree, that too matches the current commit and the index. If not, you can git add
more files and commit again, making the name master
point to new commit E
, and so on. In any case, the (single) parent of the new commit is whatever the current commit was.
About merges
Let me outline how git merge
actually works. It's very simple in some cases and some ways, and let's use the simplest true-merge case to start with. Consider a graph that looks like this:
o--...--L <-- mainline (HEAD)
/
...--o--*
\
o--...--R <-- feature
We have run git checkout mainline; git merge feature
, so we are telling Git to merge branch feature
/ commit R
into branch mainline
/ commit L
. To do this, Git must first find the merge base commit. The merge base is, roughly speaking, the "nearest" commit common to—i.e., reachable from—both branches. In this simple case, we start at L
and walk backwards to older commits, and start at R
and walk backwards, and the first place we meet is commit *
, so that's the merge base.
(For much more about reachability, see Think Like (a) Git.)
Having found the merge base, Git needs to turn both the L
(left-side / local / --ours
) and R
(right-side / remote / --theirs
) snapshots into change-sets. These change-sets tell Git what we did, on mainline
, since the merge base *
, and what they did, on feature
, since the merge base. These three commits all have hash IDs, which are the real names of the three commits, so Git can internally run the equivalent of:
git diff --find-renames <hash-of-*> <hash-of-L> # what we changed
git diff --find-renames <hash-of-*> <hash-of-R> # what they changed
The merge simply combines the two sets of changes, and applies the combined set to the files in the snapshot in *
.
When all goes well, Git makes the new commit in the usual way, except that the new commit has two parents. This makes the current branch to point to the new merge commit:
o--...--L
/ \
...--o--* M <-- mainline (HEAD)
\ /
o--...--R <-- feature
The first parent of M
is L
, and the second is R
. This is why reverts almost always use parent #1, and why git log --first-parent
only "sees" the mainline branch, traversing from M
up to L
while ignoring the R
branch entirely. (Note that the word branch here refers to the structure of the graph, rather than branch names like feature
: at this point, we can delete the name feature
entirely. See also What exactly do we mean by "branch"?)
When things go wrong
A merge will stop, with a merge conflict, if the two change-sets overlap in a "bad way". In particular, suppose that the base-vs-L says to change line 75 of file F
, and the base-vs-R also says to change line 75 of file F
. If both change-sets say to make the same change, Git is OK with this: the combination of the two changes is to make the change once. But if they say to make different changes, Git declares a merge conflict. In this case, Git will stop after doing whatever it can on its own, and make you clean up the mess.
Since there are three inputs, Git will, at this point, leave all three versions of file F
in the index. Normally the index has one copy of each file to be committed, but during this conflict resolution phase, it has up to three copies. (The "up to" part is because you can have other kinds of conflicts, which I won't go into here for space reasons.) Meanwhile, in the work-tree copy of file F
, Git leaves its approximation to the merge, with either two, or all three, sets of lines in the work-tree file with <<<<<<<
/ >>>>>>>
markers around them. (To get all three, set merge.conflictStyle
to diff3
. I prefer this mode for resolving conflicts.)
As you have seen, you can resolve these conflicts any way you like. Git assumes that whatever you do is the right way to resolve the problem: that this produces the exactly-correct final merged files, or lack of files in some cases.
Whatever you do, though, the final merge—assuming you don't abort it, and are not using one of the non-merge-y variants of merge—still makes the same result in the graph, and whatever you put in the index, by resolving the conflicts, is the result of the merge. That's the new snapshot in the merge commit.
More-complex merge bases
When the graph is very simple like the one above, the merge base is easy to see. But graphs don't stay simple, and yours isn't. The merge base for a graph that has some merges in it is trickier. Consider, e.g., just the following fragment:
...--sc4----M4---R1
\ /
...--M2---M3--------R2
If R1
and R2
are two tip commits, what is their merge base? The answer is M3
, not sc4
. The reason is that while M3
and sc4
are both commits that are reachable by starting at both R1
and R2
and working backwards, M3
is "closer" to R2
(one step back). The distance from R1
to either M3
or sc4
is two hops—go to M4
, then go back one more step—but the distance from R2
to M3
is one hop and the distance from R2
to sc4
is two hops. So M3
is "lower" (in graph terms) and therefore wins the contest.
(Fortunately, your graph has no cases where there is a tie. If there is a tie, Git's default approach is to merge all the tied commits, two at a time, to produce a "virtual merge base", which is in fact an actual, albeit temporary, commit. It then uses this temporary commit made by merging the merge bases. This is the recursive strategy, which gets its name from the fact that Git recursively merges the merge bases to get a merge base. You can choose instead the resolve strategy, which simply picks one of the bases at seemingly-random—whichever base pops out at the front of the algorithm. There's rarely any advantage to that: the recursive method usually either does the same thing, or is an improvement over randomly selecting a winner.)
The key takeaway here is that making a merge commit changes which commit future merges will choose as their merge base. This is important even when making simple merges, which is why I put it in boldface. It's why we make merge commits, as opposed to squash-"merge" operations that aren't merges. (But squash merges are still useful, as we will see in a bit.)
Introducing the problem: what went wrong (so you can avoid it in the future)
With the above out of the way, now we can look at the real problem. Let's start with this (edited slightly to use the updated commit and branch names):
I merged branch-T1
into branch-F
(M1
), then branch-F
into branch-T1
(M2
).
I assume here that merging fc2
(as the then-tip of branch-F
) and o
(as the then-tip of branch-T1
) went well, and Git was able to make M1
on its own. As we saw earlier, merging is really based not on branches but on commits. It's the creation of a new commit that adjust the branch names. So this created M1
, so that branch-F
pointed to M1
. M1
itself pointed back to the existing tip of branch-T1
—a commit I've now marked o
—as its second parent, with fc2
as its first parent. Git figures out the correct contents for this commit by git diff
-ing the contents of T0
, the merge base, against o
and against fc2
:
T0-------------o <-- branch-T1
\
F0--fc1---fc2 <--- branch-F (HEAD)
With all going well, Git now makes M1
on its own:
T0-------------o <-- branch-T1
\ \
F0--fc1---fc2---M1 <--- branch-F (HEAD)
Now you git checkout branch-T1
and git merge --no-ff branch-F
(without --no-ff
Git will just do a fast-forward, which is not what is in the picture), so Git finds the merge base of o
and M1
, which is o
itself. This merge is easy: the difference from o
to o
is nothing, and nothing plus the difference from o
to M1
equals the contents of M1
. So M2
, as a snapshot, is exactly the same as M1
, and Git easily creates it:
T0-------------o----M2 <-- branch-T1 (HEAD)
\ \ /
F0--fc1---fc2---M1 <--- branch-F
So far, so good, but now things start to go really wrong:
There was one file in the T1
branch that was having merge conflicts with S
... Given the issues I've had in the past with merge conflict resolutions not behaving how I expect, I thought I'd try something new: merging just the conflicting file from S
into T1
, solving the merge conflict there, removing all of the other files from the merge, and then allowing continuous integration to merge everything up to S
.
So, what you did at this point is:
git checkout branch-T1
git merge branch-S
which stopped with a merge conflict. The graph at this point is the same as the one above, but with some more context:
S0--sc1---sc2---sc3-----sc4 <-- branch-S
\
T0-------------o----M2 <-- branch-T1 (HEAD)
\ \ /
F0--fc1---fc2---M1 <-- branch-F
The merge operation finds the merge base (S0
), diffs that against the two tip commits (M2
and sc4
), combines the resulting changes, and applies them to the contents of S0
. The one conflicted file is now in the index as the three input copies, and in the work-tree as Git's effort at merging, but with conflict markers. Meanwhile all the unconflicted files are in the index, ready to be frozen.
Alas, you now remove some files (git rm
) during the conflicted merge. This removes the files from the index and work-tree both. The resulting commit, M3
, will say that the correct way to combine commits M2
and sc4
based on merge-base S0
is to remove those files. (This of course was the mistake.)
This auto-merged to S
(M4
).
Here, I assume this means that the system, using whatever pre-programmed rule it has, did the equivalent of:
git checkout branch-S
git merge --no-ff branch-T1
which found the merge base of commits sc4
(tip of branch-S
) and M3
, which is M3
, the same way that the merge base of o
and M1
was M1
earlier. So the new commit, M4
, matches M3
in terms of content, at which point we have:
S0--sc1---sc2---sc3-----sc4----M4 <-- branch-S
\ \ /
T0-------------o----M2---M3 <-- branch-T1
\ \ /
F0--fc1---fc2---M1 <-- branch-F
I noticed immediately that excluding those ~200 files looked to have wiped the changes out entirely, which equated to about a month's worth of work across 2 teams. I (incorrectly) decided the best course of action was to act swiftly and revert the merge commits M4
and M3
before my mistake got into anyone else's local repos. I first reverted M4
(R1
) and once that was committed I reverted M3
(R2
).
Actually, this was a fine thing to do! It gets the right content, which is pretty useful when you do it immediately. Using git checkout branch-s && git revert -m 1 branch-S
(or git revert -m 1 <hash-of-M4>
) to create R1
from M4
basically undoes the merge in terms of content, so that:
git diff <hash-of-sc4> <hash-of-R1>
should produce nothing at all. Likewise, using git checkout branch-T1 && git revert -m 1 branch-T1
(or the same with the hash) to create R2
from M3
undoes that merge in terms of content: comparing M2
and R2
, you should see identical content.
Undoing a merge undoes the contents, but not the history
The problem now is that Git believes that all the changes in your feature branch are correctly incorporated. Any git checkout branch-T1
or git checkout branch-S
followed by git merge <any commit within branch-F>
will look at the graph, following the backwards-pointing links from commit to commit, and see that this commit within branch-F
—such as fc2
or M1
—is already merged.
The trick to getting them in is to make a new commit that does the same thing that the commit-sequence from F0
through M1
does, that's not already merged. The easiest—though ugliest—way to do that is to use git merge --squash
. The harder, and perhaps better, way to do that is to use git rebase --force-rebase
to make a new feature branch. (Note: this option has three spellings and the easiest one to type is -f
, but the one in Linus Torvalds' description is --no-ff
. I think the most memorable is the --force-rebase
version, but I would actually use -f
myself.)
Let's take a fast look at both, and then consider which to use and why. In either case, once you are done, you'll have to merge the new commit(s) correctly this time, without removing files; but now that you know what git merge
is really doing, it should be a lot easier to do.
We start by creating a new branch name. We can re-use branch-F
, but I think it is clearer if we don't. If we want to use git merge --squash
, we create this new branch name pointing to commit T0
(ignoring the fact that there are commits after T0
—remember, any branch name can point to any commit):
T0 <-- revised-F (HEAD)
\
F0--fc1--fc2--M1 <-- branch-F
If we want to use git rebase -f
, we create this new name pointing to commit fc2
:
T0-----....
\
F0--fc1--fc2--M1 <-- branch-F, revised-F (HEAD)
We do this with:
git checkout -b revised-F <hash of T0> # for merge --squash method
or:
git checkout -b revised-f branch-F^1 # for rebase -f method
depending on which method we want to use. (The ^1
or ~1
suffix—you can use either one—excludes M1
itself, stepping back one first-parent step to fc2
. The idea here is to exclude commit o
and any other commits reachable from o
. There need to be no other merges into branch-F
along that bottom row of commits, here.)
Now, if we want to use a "squash merge" (which uses Git's merge machinery without making a merge commit), we run:
git merge --squash branch-F
This uses our current commit, plus the tip of branch-F
(commit M1
), as the left and right sides of the merge, finding their common commit as the merge base. The common commit is of course just F0
, so the merge result is the snapshot in M1
. However, the new commit made has only one parent: it is not a merge commit at all, and it looks like this:
fc1--fc2--M1 <-- branch-F
/
F0-------------F3 <-- revised-F (HEAD)
The snapshot in F3
matches that in M1
, but the commit itself is all new. It gets a new commit message (which you may edit) and its effect, when Git looks at F3
as a commit, is to make the same set of changes made from F0
to M1
.
If we choose the rebase method, we now run:
git rebase -f <hash-of-T0>
(You could use instead the hash of o
, which is branch-F^2
, i.e., the second parent of M1
. In this case you can start with revised-F
pointing to M1
itself. That's probably what I would do, to avoid having to cut and paste a lot of hash IDs with potential typos, but it's not obvious how this works unless you've done a lot of graph manipulation exercises.)
That is, we want to copy commits F0
through fc2
inclusive to new commits, with new hash IDs. That's what this git rebase
will do (see other StackOverflow answers and/or Linus' description above): we get:
F0'-fc1'-fc2' <-- revised-F (HEAD)
/
T0-----....
\
F0--fc1--fc2--M1 <-- branch-F
Now that we have revised-F
pointing to either a single commit (F3
) or a chain of commits (the chain ending at fc2'
, the copy of fc2
), we can git checkout
some other branch and git merge revised-F
.
Based on comments, here are two paths for doing the re-merge
I assume at this point that you have a squash-merge result (a single-parent commit that's not a merge, but does contain the desired snapshot, which I'm calling F3
here). We need to revise the re-drawn graph a bit too, based on comments that indicate there were more merges into branch-F
:
S0--sc1---sc2---sc3-----sc4----M4---R1---M5---sc5 <-- branch-S
\ \ / /
T0-----o-------o----M2---M3--------R2 <---- branch-T1
\ \ \ /
F0--fc1-o-fc2---M1 <--------------- branch-F
Now we'll add the revised-F
branch, which should have a single commit that is a descendant of either F0
or T0
. It's not crucial which one. Since I used F0
earlier, let's go with that here:
S0--sc1---sc2---sc3-----sc4----M4---R1---M5---sc5 <-- branch-S
\ \ / /
T0-----o-------o----M2---M3--------R2 <---- branch-T1
\ \ \ /
F0--fc1-o-fc2---M1 <--------------- branch-F
\
---------------------------------F3 <-- revised-F
The contents of commit F3
match those of M1
(so git diff branch-F revised-F
says nothing), but the parent of F3
here is F0
. (Note: there are shortcut ways to create F3
using git commit-tree
, but as long as it already exists and matches M1
content-wise, we can just use it.)
If we now do:
git checkout branch-T1
git merge revised-F
Git will find the merge base between commit R2
(tip of branch-T1) and F3
(tip of revised-F
). If we follow all the backwards (leftwards) links from R2
, we can get to T0
via M3
then M2
then some number of o
s and finally T0
, or we can get to F0
via M3
then M2
then M1
then fc2
on back to F0
. Meanwhile we can get from F3
straight to F0
, in just one hop, so the merge base is probably F0
.
(To confirm this, use git merge-base
:
git merge-base --all branch-T1 revised-F
This will print one or more hash IDs, one for each merge base. Ideally there's just the one merge base, which is commit F0
.)
Git will now run the two git diff
s, to compare the contents of F0
to F3
—i.e., everything we did to accomplish the feature—and to compare the contents of F0
to those of R2
, at the tip of branch-T1
. We'll get conflicts where both diffs change the same lines of the same files. Elsewhere, Git will take the contents of F0
, apply the combined changes, and leave the result ready to be committed (in the index).
Resolving these conflicts and committing will give you a new commit that results in:
S0--sc1---sc2---sc3-----sc4----M4---R1---M5---sc5 <-- branch-S
\ \ / /
T0-----o-------o----M2---M3--------R2-----M6 <---- branch-T1
\ \ \ / /
F0--fc1-o-fc2---M1 <-- branch-F /
\ /
---------------------------------F3 <-- revised-F
Now M6
is, perhaps, merge-able to branch-S
.
Alternatively, we can merge directly to branch-S
. It's less obvious which commit is the merge base, but it is probably F0
again. Here is the same drawing again:
S0--sc1---sc2---sc3-----sc4----M4---R1---M5---sc5 <-- branch-S
\ \ / /
T0-----o-------o----M2---M3--------R2 <---- branch-T1
\ \ \ /
F0--fc1-o-fc2---M1 <--------------- branch-F
\
---------------------------------F3 <-- revised-F
Starting from commit sc5
, we work backwards to M5
to R2
, and we're now in the same situation we were before. So we can git checkout branch-S
and do the same merge, resolve similar conflicts—this time we're comparing F0
to sc5
rather than to R2
, so the conflicts might be slightly different—and eventually commit:
S0--sc1---sc2---sc3-----sc4----M4---R1---M5---sc5----M6 <-- branch-S
\ \ / / /
T0-----o-------o----M2---M3--------R2 <------ / -- branch-T1
\ \ \ / /
F0--fc1-o-fc2---M1 <-- branch-F /
\ /
---------------------------------------F3 <-- revised-F
To verify that F0
is the merge base, use git merge-base
as before:
git merge-base --all branch-S revised-F
and to see what you'd have to merge, run two git diff
s from the merge base to the two tips.
(Which merge to do is up to you.)