0

I have a problem where I need to merge 2 branches, both containing a README.md file.problem If I try to merge the 2 branches with the readme to a 3rd branch, ofc the first one gets through without a problem. If I try the 2nd branch ofc it gives me a content conflict, because the 2nd readme would delete the email out of the 1st one. Now we were given a website which should have helped fix that with a mergetool, but if we try to change something via the mergetool the 2 lines seem to be "stuck" together, meaning if I change a line with local it changes both, same with remote. I'm pretty new to git etc. (like a few hours max) and I have no idea how to find a solution, so any help would be great^^

Trying to merge 2 README files, content conflict. Mergetool didn't work like i thought it would.

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
  • Does this answer your question? [How do I resolve merge conflicts in a Git repository?](https://stackoverflow.com/questions/161813/how-do-i-resolve-merge-conflicts-in-a-git-repository) – Michael Delgado Oct 27 '22 at 00:34

2 Answers2

2

Note: this answer got too long so I've split it into two parts.

Part 1 (part 2 here)

You have encountered a basic stumbling block for those who try to use Git without a tutorial, or with a bad tutorial (of which there are, unfortunately, many). Your mistake here is thinking that "merge", in Git, involves two versions of some file. This is not the case! Merging in Git is on a commit basis and involves three versions of each file.

Before we can get into merging, we have to start with commits. Without the proper base,1 merging won't make any sense. So let's jump into that first. Note that you may want to read this twice before you tackle your merge.


1There's a pun here that will fly right over your head if you're not already familiar with the ideas behind merging.


Commit = snapshot + metadata

Git is, at its heart, all about commits. Git is not really about files or branches. It's true that each commit stores files, and we (or Git at least) organize commits into branches, using branch names to help us find individual commits. But it's the commit itself—or the collection of commits—that's the heart of the repository.

Git stores these commits in an object database. We won't go into most of the details here, though we will skim a few necessary parts, but if you are curious, you can peek inside the hidden .git folder, where you will find a sub-folder named objects. Inside this are potentially many more folders, which store the objects in various forms ("loose" and "packed" but you don't need to care about this here). Each object, including each commit, is numbered, with a unique ID that is expressed in hexadecimal, such as d420dda0576340909c3faff364cfbd1485f70376. (This particular one is a commit in the Git repository for Git.)

This hash ID is the "true name" of the internal object, and Git literally requires it to find the object in the objects database. These names are not friendly to humans, though, so Git provides a separate secondary database—one that's not nearly as well organized, nor as well implemented, really—that stores names: branch names, tag names, and all other kinds of names. Each of these names simply stores one hash ID, which is in fact all that's necessary.

So, when you use a branch name like main or feature or whatever, you're really providing Git with a raw hash ID, that's been hidden behind this name. It's worth running git rev-parse a few times to get a feel for this mechanism:

git rev-parse main

might produce:

d420dda0576340909c3faff364cfbd1485f70376

for instance, if you have a clone of the Git repository for Git (though by now its main/master has moved on; I haven't updated my clone in more than a week now for home-life reasons).

When you clone a Git repository, it's the underlying objects that you are copying. The names in your names database are yours, not the original clone's, but the objects, which are all strictly read-only once they're created, get shared. You get a copy but you (and your Git software) are forbidden from changing any of these. Not even Git can change a Git commit. You just add new commits to the repository. That's how history exists: the existing commits continue to exist. They're just now in two repositories: the original, and your clone. Make more clones, and you make yet more copies of the objects.

With that said, let's look at the anatomy of one particular commit, in this case d420dda0576340909c3faff364cfbd1485f70376:

$ git cat-file -p d420dda0576340909c3faff364cfbd1485f70376 | sed 's/@/ /'
tree 13b45e4ccc34572dce66dc79468b66c0b383a560
parent c68bd3ec22a1afc85b0b897834b2524aedbd0553
author Junio C Hamano <gitster pobox.com> 1665507772 -0700
committer Junio C Hamano <gitster pobox.com> 1665509772 -0700

The second batch

Signed-off-by: Junio C Hamano <gitster pobox.com>

That, right there, is the entire commit object, as seen in every Git clone of the Git repository for Git. You can see that commits are pretty small! But each one has a tree object: that first line, tree followed by an internal object ID, is required in every commit.

The tree object in the commit represents a permanent archive of every file. This is another object (and you can git cat-file -p it, if you like, to see how it works in great detail), but what we see in the output above is the metadata for the commit:

  • the commit has a parent, with a raw hash ID: this is another commit object;
  • the commit has an author and a committer: these are text strings giving the name and email address of the person who made the commit, along with some date-and-time stamps; and
  • the commit has a log message, which is what you see when you run git log.

The git log command uses the stored hash ID here to find the previous commit. Most commits have exactly one parent line, but a few have mroe than one parent, making them merge commits, and at least one commit in any non-empty repository has no parent because it was the first commit.

We say that the commit points to its parent or parents, and if we use single uppercase letters to stand in for real hash IDs—which are big, ugly, and random-looking2—we get drawings that look like this:

... <-F <-G <-H   <-- main

Here, the name main, a branch name, points to (contains the hash ID of) commit H. Commit H itself contains, indirectly, a full snapshot of every file—that's the tree <hash> line—and directly contains metadata, including a parent line. So commit H points to earlier commit G.

Commit G, being a commit, contains a snapshot and metadata, so G points to earlier commit F. Commit F is a commit, so it points to a still-earlier commit, which points back to an even-earlier commit, and so on, backwards, down the line to the very first commit ever (presumably commit A in our drawing).


2They're actually not random at all, and concretely, the hash ID d420dda0576340909c3faff364cfbd1485f70376 is simply the SHA-1 checksum of the content of the above commit, except that this content is prefixed by commit 284 and an ASCII NUL byte, with 284 being the decimalized size of the rest of the object. The fact that the previous hash IDs on the parent lines and the date-and-time-stamps are themselves unique means that the new hash ID is unique.3

3Anyone familiar with the pigeonhole principle should immediately object here. That objection is correct and means that Git will eventually fail. You can calculate the probability of failure with a fancy formula, and it turns out to be vanishingly small until the objects database holds more than about 1.7×1015 objects, at which point it starts to creep up towards the probability of undetected disk-drive errors. We live with those; we can live with SHA-1 collisions. Even so, Git is slowly moving towards SHA-256.


A special feature of a branch name

We're going to skip a lot here and just look at one special feature of branch names. We already know that the name points to a commit. We can also draw this:

...--G--H   <-- main, feature

where we have a single commit, H, that is pointed-to by more than one branch name. When this is the case, all the commits up to and including H are on both branches. Checking out either branch, with git switch main or git switch feature, gets us the files from commit H. But, as a special feature, checking out that name "attaches" the special name HEAD to that name:

...--G--H   <-- main (HEAD), feature

Here, we're on commit H and branch main. The files we have available to us are those from commit H. If we now run:

git switch feature

the picture changes slightly:

...--G--H   <-- main, feature (HEAD)

We're still on commit H, but we're "on" it through the name feature. We have the same files, but something wacky is about to happen.

Let's make a new commit now, in the usual way Git has for making commits (we modify some files and git add and git commit). We get a new commit, which gets a new, unique ID; we'll just call this I and draw it in:

          I
         /
...--G--H

What happens to the branch names? The answer is: nothing happens to any of the ones that we are not "on", but the one that we are "on", that name gets forcibly updated so that it points to the new commit we just made:

          I   <-- feature (HEAD)
         /
...--G--H   <-- main

If we make another new commit, we get:

          I--J   <-- feature (HEAD)
         /
...--G--H   <-- main

Commits I-J are clearly "on" the feature branch. Commits up through H are clearly "on" the main branch. Surprisingly—or not, depending on your point of view and whether you've used other version control systems—Git declares that commits up through H are on both branches at this time.

Forming more branches, and a side note on the word branch

Let's now switch back to main, with git switch main:

          I--J   <-- feature
         /
...--G--H   <-- main (HEAD)

Git will rip away our commit-J files and put back the commit-H files. The "true" files are safely saved away in the tree objects for each commit. The files we see and work with are in fact not in the repository at all, they're just copied out of the repository.

We can now create and switch to another name, perhaps dev or re-feature. Or if we like, we can commit on main. It doesn't really matter to Git, as the branch name is merely a sort of label, pointing to the commit. Later, if we decide we want to have main stay with commit H, we can make a new name that points to the new commits we're about to make, and then force the name main back to commit H.

This is a key weirdness in Git: branch names aren't really very important at all, except in that they help us by finding the last commit. Whatever commit the name points to is the last commit in the branch. If we move the name, we've changed which commit is the last commit in the branch. This is also why we say that some commit is "on" a branch if we can get there by starting at the last commit (found by the name) and working backwards. In effect, commits are "contained in" their branches more than they are "on" any one (single) branch.

Rather than mucking about with moving main forward and backwards, let's do:

git switch -c feat2

now, to get:

          I--J   <-- feature
         /
...--G--H   <-- main, feat2 (HEAD)

and make two more commits:

          I--J   <-- feature
         /
...--G--H   <-- main
         \
          K--L   <-- feat2 (HEAD)

Now, just for the heck of it, let's delete the name main. (It's kind of in the way of what we're about to do.) This gives us:

          I--J   <-- feature
         /
...--G--H
         \
          K--L   <-- feat2 (HEAD)

Commit H, which was on three branches, is now only on two branches. The branch main has ceased to exist. Or has it? If we create a new name, old-main, pointing to H:

          I--J   <-- feature
         /
...--G--H   <-- old-main
         \
          K--L   <-- feat2 (HEAD)

commit H is now on three branches again.

The number of branches that some commit is "on" is not important. The word branch in Git is rather badly overloaded (see What exactly do we mean by "branch"?) and when you see it without context, you should be careful: it may not mean anything, and the person who used it might not be aware of what they're saying. Or it may mean any of the various things that it can mean, such as "branch name", "remote-tracking name", "tip commit", and "set of commits ending at a particular commit".

Onward: git merge

With all that out of the way, let's change our two feature names to br1 and br2 so that they're easier to type in, and switch to br1 and run git merge br2 now:

          I--J   <-- br1 (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

We seem to have asked Git to merge "branch br2" into "branch br1". And that's sort of true. But in fact, what we're really asking Git to merge are commits, namely J and L. As always, each of these two commits represent a snapshot-plus-metadata.

Because we are "on" br1, the files we have available to us, before we run git merge br2, are those from commit J. The files we're asking Git to merge exist in br2, i.e., in the tip commit of br2, i.e., in commit L. But in order to perform the merge, Git cannot simply compare the files in J to the files in L. You might wonder why not, and we'll provide a simple example, but before we get there, let's consider the goal of a merge in the first place.

The goal of a standard git merge operation is to combine work. If we are going to combine work, we first have to define work. What is the work in a commit?

The work in an individual commit

Let's take a really concrete example: commit d420dda0576340909c3faff364cfbd1485f70376. If you click on this link, or clone the Git repository for Git and run git show d420dda0576340909c3faff364cfbd1485f70376, you get a bunch of text shown that includes this:

diff --git a/Documentation/RelNotes/2.39.0.txt b/Documentation/RelNotes/2.39.0.txt
index a26c82444b..a6ee7c8996 100644
--- a/Documentation/RelNotes/2.39.0.txt
+++ b/Documentation/RelNotes/2.39.0.txt
@@ -66,9 +66,25 @@ Fixes since v2.38
    led to a segfault (which is bad), which has been corrected.
    (merge 92481d1b26 js/merge-ort-in-read-only-repo later to maint).
 
+ * Force C locale while running tests around httpd to make sure we can
+   find expected error messages in the log.
+   (merge 7a2d8ea47e rs/test-httpd-in-C-locale later to maint).
[snip]

This is obviously some git diff output. It shows us a change to one file, Documentation/RelNotes/2.39.0.txt. But if a commit is a snapshot (plus metadata)—and it is—how can there be a change to a file? The snapshot is just an archive, like a zip file or tarball or whatever. The answer is that this particular commit has a (single) parent commit, whose hash ID we see above. If we have Git extract both of these commits and compare them, we'll find that this file, Documentation/RelNotes/2.39.0.txt, is different in the two commits—and the git diff output from comparing these two commits is just what we see with git show, or on the GitHub page.

Work done over multiple commits

So the difference between some commit and its parent represents the work done in that particular commit, and that's the definition we will start with. Let's go back to our simple stylized graph:

          I--J   <-- br1 (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

and look at the work done in, say, commit I. We'll find this "work" by comparing the snapshot in H to the snapshot in I. Maybe we added one new file, new.txt, and did nothing else. We can look at the work done in commit J too: maybe we edited old.txt and README.md. The changes to those two files is the work done in J.

Now, what about commits K and L? Maybe in commit K we modified foo.py, and in commit L we made another change to foo.py and also modified README.md.

If we compare commit J with commit L, then, we'll see the following:

  • delete new.txt (it's in J, where we added it because of I, but it is not in L);
  • undo the change we made to old.txt (we did that in J);
  • modify foo.py (we made two changes to that in K and L), and modify README.md to add whatever we did in L, and to undo whatever we did in J.

This is clearly no good! We don't want to remove new.txt at all. Maybe we can compare in the other direction, so that we add new.txt. But we already have a new.txt, and this comparison will tell us to undo whatever changes we made in foo.py. That, too, is no good.

No: What we need is to identify the work done on br1 first, separately. We can do that pretty easily by comparing the snapshot that's in H to the one that's in J. That will show us:

  • the new file added; and
  • the change we made to README.md.

That's the "work done in br1", or more precisely, the work done in commits I-J.

Once we have that, we can try to identify the work done on br2. We can do that pretty easily too, by comparing the snapshot that's in H to the one that's in L. That will show us:

  • the two changes made to foo.py; and
  • the change we made to README.md.

That's exactly what we want! But hang on a moment:

  • Why did we go back to commit H? Why not back to commit G?
  • How did we pick commit H in the first place?
  • How do we combine these changes?

The answer to the first two questions is git merge-base.

The merge base

If we just look at the picture we drew:

          I--J   <-- br1 (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

it's stunningly obvious why we picked commit H. Commit H is on both branches. So is commit G, of course, and so are all the commits before G, but commit H is the last of these shared commits. Going further back in time, to an even-earlier shared commit, nets more overall changes, where "both sides" will make the same changes, and there's no profit in doing that. So commit H, the last shared commit, is also the best shared commit.

There are cases where the best shared commit is not obvious at all, and there are ways to draw the graph that make it less obvious that H is the best shared commit. Git has, built into it, an implementation of the Lowest Common Ancestor algorithm (as extended to DAGs like Git's commit graph), so that git merge-base can find the shared commit, and git merge uses that by default.4 We thus usually don't have to think about this: we just run git merge br2 and Git finds the best shared commit, in this case H, and does its thing.

Not having to think and worry about it, though, does not mean we can ignore it. We must realize and remember that when we run git merge, Git is going to find a merge base commit.5 This merge base supplies the third version of each file.


4I've mentioned this "by default" a few times, and that is because git merge allows you to specify a merge strategy. There's a merge strategy, -s ours, that means ignore everything they did. This strategy doesn't bother finding a merge base at all. There are some other fancier strategies that do complicated stuff, but we won't cover those here.

Git's "merge strategy" -s argument should not be confused with Git's strategy-option, or -X, arguments to git merge. I like to call these eXtended options to keep them apart in my head. The -X options are passed to the strategy, which then does whatever it does with them. Since we almost always use the default strategy in the first place, most people seem to think of the -X extended options as options to git merge, but they're actually specific to the strategy. (This is all very confusing, and perhaps was a bad idea, rather like Douglas Adams' famous quote: "In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.")

5There are particularly nasty cases where there is more than one "best" merge base commit, and for these cases, Git defaults to merging the merge bases to come up with a "virtual merge base". With any luck, you will never experience the wonders, er, horrors, er, terror ... this particular case yourself. Seriously, it's not usually terrible, but occasionally, it is really awful and ugly. You can check to see if this will happen or has happened using git merge-base --all: if that spits out more than one hash ID, you've hit the multiple-merge-bases case, and it's time to find a helpful StackOverflow article.


How Git handles three versions of each file

Now that we know that git merge br2 really uses three versions of each file:

          I--J   <-- br1 (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

with the merge base version of each file coming from commit H, we can see how Git uses these.

Let's start with the easy cases, with our hypothetical example. Here, we modified foo.py in br2—once in each new commit—but we didn't touch foo.py at all in our br1 commits. So the diff from H to J shows nothing for this file, while the diff from H to L shows the two changes.

To combine nothing with something, Git takes the something. That was easy! Git then applies the "something" to the copy from commit H, which is also the copy from commit J, which is also the copy we have sitting in our working tree. So the result is that the changes we made in K and L show up in our working tree.

Let's take another easy case: the file NOTES.md in all three commits match. To do nothing at all on the left (in our changes on I-J), and nothing at all on the right (in their changes on K-L), Git does nothing: it takes any of the three copies of NOTES.md, from any of the commits, such as the one we already have in our working tree, and just leaves it alone.

Let's take a third easy case: the file new.txt does not exist in the merge base commit H, and does not exist in commit L, but is there in commit J and in our working tree. Git combines the "create" with the "do nothing" to create the file, i.e., leave it in our working tree (and in Git's index, about which we'll say more in a moment).

Had we created a new file in "their" commits (K-L), or deleted a file on either side, or whatever, Git would take that change—create new file, or delete file—and copy that across. Any time one side does something and the other side doesn't do something, we take the "something".

This leaves us with the hard case, or at least, the potentially hard case: we did something, and they did something, to the same file.

Combining changes to one file, and how Git expands its index

There's an important thing we have not mentioned at all here yet, and it's a big topic that I won't really cover properly, which is this: Git does not make commits from what's in your working tree. Git makes commits from what is in Git's index. This thing, this "index", is crucial in Git because Git uses it to make new commits. It's so important, and perhaps so poorly named (what the heck does index mean anyway?), that it actually has three names:

  • when called "the index", as I do here, we can refer to everything it does;
  • when called "the cache", as Git mostly only does in flags now (git rm --cached), it refers to how Git uses it to speed stuff up; and
  • when called the staging area, which is perhaps the best name for how you use it, Git is describing how you use it: to "stage" the next commit.

What's in the index or staging area is, to put it briefly and gloss over some details, a sort of a copy of each file that is going to be committed if you run git commit right now. That is, when you first switch to some commit, Git not only extracts that commit's files to your work area, so that you can see them and work on them. Git also extracts the same files to Git's index aka staging area, so that they're all staged to go into the next commit in exactly the same form they have in this commit.

The existence of this index / staging-area is why you have to run git add every time you change a file. The git add command tells Git:

  • open and read the working-tree copy of the file;
  • compress the data down to the internal form for a loose object;
  • check to see if we already have the file data (i.e., is this a duplicate?);
  • make either the original object (if duplicate), or this now-prepared object, ready for committing.

So we will re-use the original if this is a duplicate. Otherwise, the next commit will use this data, which has never been committed before. Either way, at the end of git add, the working tree version of the file is now in Git's index, ready to be committed.

The de-duplication that happens in this step is a big part of how Git keeps the commits from bloating up the repository, even though every commit stores every file every time. Most commits are mostly duplicates, and these duplicate copies take no space because they're de-duplicated. It's during git add, not git commit, when Git actually does the duplication checking and de-duplicating. So if you don't force Git to re-add every file every time,6 git add and git commit go very fast.

Now, this is the normal condition of the index, when we're not in the middle of a merge. But when you run git merge, Git expands the index, creating three extra "slots" for each file:

  • slot zero, if it's used, is the normal ready-to-commit copy: if this index entry is occupied, the other three slots are erased and the file is not conflicted;
  • slots 1, 2, and 3 hold the merge base, "ours", and "theirs" copy of the file: if any of these slots are occupied, slot zero is erased and the file is conflicted.

Hence the way git merge works—at least from a high level viewpoint7—is this:

  • the "ours" copy of each file moves from slot zero to slot 2;
  • the merge base copy of each file goes into slot 1; and
  • the "theirs" copy of each file goes into slot 3.

At this point, we have all three slots filled for any file that appears in all three commits. Now we just take care of each possible case:

  • All three slots hold the same copy of the file: nobody touched it at all, just use any copy, collapse it all down to slot zero.

  • Slots 1 and 2 match and slot 3 is different, or slots 1 and 3 match and slot 2 is different: we or they touched the file, and they or we didn't, so take the modified file, whichever slot that's in, and move that to slot zero and erase the other two.

  • Slot 1 is empty, slot 2 is occupied, slot 3 is empty; or slot 1 is empty, slot 2 is empty, and slot 3 is occupied: we or they added the file. Put the non-empty entry into slot 0 and erase all the others.

  • Slot 1 is not empty, and matches what's in 2 or 3, but the other of 2 or 3 is empty: one of us removed the file and the other of us didn't, so remove the file, by removing all the entries (no slots left at all).

Some of the remaining cases are messy and I leave it as an exercise to work them out (consider, e.g., "slot 1 empty, slots 2 and 3 both have files in them", which may be an add/add conflict). The usual hard case is the one where slots 1, 2, and 3 are all occupied with different copies of the file: that's your standard "merge conflict".


6Note that running git add . makes Git check for changes via various magic OS-dependent file-system tricks, which is normally much faster than re-compressing every file. This is where that "cache" aspect of the index comes in. You have to defeat this cache trick to really see the timing difference, and these details are beyond the scope of this answer, though I'll note that the --renormalize option is the mostly-portable way to mostly do most of this.

7The git merge code takes care not to bother expanding the index for unconflicted files, as most files are mostly unconflicted and this makes everything go a lot faster. But that complicates the code a lot; the simplified view where we do the expand-then-combine-then-shrink is a whole lot easier to think about, and gives the same result, just a bit slower.

This also skips over the whole "renamed file" identification process, which is kind of tricky.

on to part 2

torek
  • 448,244
  • 59
  • 642
  • 775
2

Part 2 (part 1 here)

Auto-merging

If we have a standard, boring conflict—that is, if we have all three slots occupied with three different versions of some file, as would be the case with our README.md changes—Git really does have to use git diff now on the two pairs. By comparing the base version to each tip commit version, one at a time, Git can see which line(s) of the file we changed, and which line(s) of the file they changed.

If the lines we changed are sufficiently far away from the lines they changed, git merge can simply combine the two changes. We touched line 42 (replacing it with two lines 42+43) and they touched line 123 (now line 124), so Git makes our change to the base version at line 42 (now 42+43) and makes their change to the base version at line 123 (now line 124)—and as far as Git is concerned, Git has successfully auto-merged our work.

If Git thinks it successfully did this auto-merging, Git writes the resulting file back into the working tree and does a git add to put this file into its index. The git add step erases slots 1 through 3 and writes the working tree copy into slot zero. The merge conflict has been resolved and the file is ready to commit.

On the other hand, perhaps we changed line 42 (making it two lines, 42+43) and they also changed line 42 (perhaps leaving it one line). In this case, Git can't combine our changes with their changes. What Git does for this case is write, to the working tree, a "merge conflict" version of the file. Git then leaves all three input files in its index, so that slots 1, 2, and 3 are occupied with the three input copies of README.md. Slot zero is empty.

In general, Git considers a change to be conflicted if:

  • the two sides modify the same line(s) but in different ways; or
  • the two sides modify lines that "touch" each other (abut).

When this happens Git will leave you with a merge conflict, which you must fix up yourself. All Git needs is the final version of the file.

Manual merge

You can provide the merge result any way you like. One way is to open the conflicted file, as it appears in your working tree, in any editor and edit it into shape. Write the result back into the working tree and run git add:

git add README.md

Git will read and compress the working tree copy and prepare it for committing as usual, and will write this to the slot-zero entry for the file. This erases the conflicted-file inputs in slots 1, 2, and 3, and the file is now resolved.

git mergetool

Another way to merge conflicted files is to use git mergetool. This command will in fact run any arbitrary editor of your choice, but to make that work, things get a little complicated.

What this program does is, from a high level viewpoint anyway:

  1. extract all three input files from the index (using git checkout-index --stage=all, more or less);
  2. run some command on the three files, perhaps with a fourth name supplied;
  3. run git add on the result.

It repeats this for each conflicted file. The tricky parts include, but are not limited to, these:

  • How does Git decide whether you actually merged the file?
  • Where are the three (or four) files?

The answer to the first question depends on the editor or other program that git mergetool runs: some programs have a "trusted exit code" that tells Git whether they succeeded at merging. If so, git mergetool does the git add for you. Others don't: for this case, git mergetool make a backup of the unmerged mashed-together file that git merge left in your working tree, and compares the supposedly-merged final version of the file against this backup. If the backup matches the supposedly-merged version, git mergetool assumes that the merge didn't work.

The actual temporary copies of each file are currently handled by renaming the internal names that git checkout-index spews out to path.LOCAL, path.REMOTE, and so on. This isn't really promised anywhere.

If you're using a pre-packaged merge tool, someone has (at least supposedly) set up all of this stuff so that git mergetool runs your preferred editor, whatever that is, in such a way that it presents you with a sensible way to merge the three input files. If you're setting this up on your own, poke around inside the internals of git mergetool as it's horrendously complicated.

The final result

No matter how you go about finishing a conflicted merge, in the end, what you have in your working tree and in Git's index is a set of files. These are—or at least, Git will believe these are—the correct merge result, by definition. All the files in Git's index are now in staging-slot-zero, so that git status says that all conflicts are fixed and it's now OK to commit the merge.

You now run either:

git merge --continue

or:

git commit

(both wind up doing exactly the same thing) and Git makes a new merge commit. If the input graph looks like this:

          I--J   <-- br1 (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

and you ran git merge br2, the end result looks like this:

          I--J
         /    \₁
...--G--H      M   <-- br1 (HEAD)
         \    /²
          K--L   <-- br2

Note that new commit M is a commit, so like every commit, it holds a snapshot and metadata. The snapshot in M is whatever you had in Git's index (in staging slot zero) when you finished the merge, or, if Git was able to auto-merge everything, everything Git put in staging-slot-zero: a full snapshot of every file, as of the form it has for that commit.

The metadata in commit M is where things are very slightly special. Unlike an ordinary commit, merge commit M has two parents instead of just one. The first parent is the same as normal: it's commit J, where branch name br1 pointed before the merge operation finished. The second parent is the one you said to merge: br2 selected commit L at the time, so that's the second parent of commit M.

It's now safe to delete the name br2 if you like, as Git can find commit L by working backwards from M. As always, we only need to have a branch name pointing to some commit if we want to be able to find that commit quickly and easily, using that name. If we have some other way to find the commit, we don't need the branch name. We may or may not want it; that's up to us; the names are there for us to use for our purposes, after all.

Reminders: what you should know now

  • A Git repository is mostly a big collection of commits and other objects.
  • These objects are strictly read-only. They are numbered, with hash IDs, which are their "true names". The hash IDs depend on the content, which is why the content can't be changed. If you need to improve a commit, you can make a new improved one. You literally can't fix the existing one though.
  • We normally find commits using names—branch and tag names for instance—and then maybe working backwards, one hop at a time, as git log does.
  • We like to see commits with git diff or git show, which compares two snapshots.
  • Making new commits makes the current branch name advance.
  • The word "branch" is ambigous; be careful with it.
  • Checking out a commit extracts its files. Checking out a branch name (with git switch or git checkout) selects the tip commit, as pointed-to by the branch name, and extracts its files.
  • Besides the frozen (and de-duplicated and Git-ified) copies of files in the commit, there are useful copies of the files in your working tree, and Git-ified (but not quite frozen) copies in Git's index or staging area.
  • git commit uses the index / staging-area copies to make the new commit; git add updates the index copy of each added file.
  • git merge works by finding a merge base and, in effect, running two git diff operations to see what "work" happened since that shared common merge-base commit.
  • Git resolves merge conflicts on a line-by-line basis, whether that makes any sense or not. This works great for many computer programming languages, and can work well for documentation. It actually doesn't work very well for things like XML though.
torek
  • 448,244
  • 59
  • 642
  • 775