Before you can understand git merge
, you need to understand that Git isn't really about branches or files. Git is, instead, all about commits. This means you need to know precisely what a commit is and does. We won't cover everything here, but these are the relevant points for now:
Each commit is numbered. The numbers aren't simple counting numbers—they don't go from commit 1 to commit 2 to commit 3—but instead are big ugly hash IDs. These look random, but aren't: each is actually a cryptographic checksum of the contents of the commit. You will have seen these hash IDs in git log
output (or run git log
now to see them).
Because of this cryptographic checksum numbering, no part of any commit can ever be changed. A commit, once made, is totally read-only, and mostly permanent. (It is possible to get rid of commits, but we won't look at that here.)
A commit has two parts, its data and its metadata. The data part holds a full snapshot of every file that Git knew about at the time you (or whoever) made the commit. The metadata holds information about the commit itself, such as who made it, when, and why: their name, email address, and log message.
Inside the metadata, Git adds some information that's for Git itself: Each commit holds the commit number—the hash ID—of its immediate predecessor commit. Git calls this the parent of the commit. (Most commits have exactly one of these parent IDs, but merge commits, as we'll see, have more.)
Git can look up any commit—or any internal Git object at all—by its hash-ID. So if we have a hash ID in hand, we say that this points to the commit, which Git can now find. And, because each commit stores the hash ID of its parent, these commits point backwards. This means we can draw a simple linear chain of ordinary commits like so:
... <-F <-G <-H
where each uppercase letter stands in for an actual hash ID. If we know hash ID H
, we can have Git find the actual commit, which contains both a snapshot of files, and the hash ID of earlier commit G
. This lets Git find G
, which contains a snapshot and hash ID F
, which lets Git find F
, and so on.
This is how Git works, in general: backwards, from the last commit. Note that commit G
, for instance, could not be changed: it can point back to F
because F
existed when we (or whoever) made G
, but H
did not exist yet, and we won't know H
's ID until we make it. So G
can point back to F
, but not forwards to H
. H
can point back to G
, but not forwards to whatever will come later.
But we still have one problem. Where will Git get hash ID H
? This is where branch names come in.
A branch name holds one commit hash ID
Given the chain of commits:
...--F--G--H
Git simply puts hash ID H
into some branch name, such as master
. This name then points to commit H
, making it easy to find:
...--G--H <-- master
If we now wish to add a second branch name, Git requires us to pick some existing commit, and will make the new name point to this commit. Quite often, we pick the commit we're using right now—e.g., commit H
:
...--G--H <-- develop, master
Now that we have two names—for the same commit, at the moment—we need a way to know which name we're actually using. To handle that, Git attaches the special name HEAD
to just one branch name, so we should update our drawing:
...--G--H <-- develop, master (HEAD)
This indicates that while both names pick commit H
, the name we're using is master
. Note that every commit is on both branches.
Let's say we now make two new commits on master
(for no apparent reason, but maybe we forgot to switch to develop
first). When we make the first new commit, Git will:
- save a snapshot of every file Git knows about;
- add our name and email address as the author and committer, and set the time stamps to "now";
- add our log message;
- use the hash ID of the current commit
H
as the parent hash for our new commit;
- write out the new commit, which will thereby get its own unique number, but we'll just call it
I
; and
- write the new commit's hash ID into the current branch name.
The result is thus:
I <-- master (HEAD)
/
...--G--H <-- develop
If we make a second new commit in this state, we get:
I--J <-- master (HEAD)
/
...--G--H <-- develop
At this point, commits up through H
are still on both branches, but commits I
and J
are only on master
.
Now let's run git checkout develop
(or in Git 2.23 and later, git switch develop
will do the same thing). This makes our current branch name become develop
and our current commit go back to commit H
:
I--J <-- master
/
...--G--H <-- develop (HEAD)
Git will update its internal next-commit files (in Git's index aka staging area, which we haven't covered here) and our working tree files to match commit H
, so that we start with the same files that are saved forever in H
. If we now make a new commit, we get:
I--J <-- master
/
...--G--H
\
K <-- develop (HEAD)
Note that each branch name just identifies the one commit: J
for master
, and K
for develop. Git calls these the tip commits of the branches.
Making a second new commit gives us:
I--J <-- master
/
...--G--H
\
K--L <-- develop (HEAD)
Commits K
and L
are now only on develop
. We're now in a situation in which git merge
makes sense.
Real merges
What git merge
does can be described in just one sentence, but the details get rather complicated. Merging is about combining changes. But we've just seen that Git doesn't actually store changes. Each commit has a complete snapshot of every file. So how can Git do this?
Git's answer to this is to go back to the drawings we're making. These drawings produce a commit graph. By starting from any two commits—typically, two commits found by two branch names—and working backwards, Git can find the best common commit. In this case, it's easy to see that commits H
and G
and earlier are on both branches. H
is the best such commit because it's the closest to the two branch-tip commits.
Git calls this best common / shared commit the merge base. To use git merge
, then, we do two things:
- run
git checkout
to get on one of the two branches, so that HEAD
is attached to the branch that finds one of the tip commits; then
- run
git merge otherbranch
, so that git merge
can locate the other tip commit.
Git then finds the merge base on its own. In our case, we might run:
git checkout master
git merge develop
which will use H
as the merge base.
To find what changed, Git will use an internal variant of the git diff
command. Commit H
, the merge base, holds a snapshot of all files. Commit J
, at the tip of master
, holds a snapshot of all files as well. Using git diff
, Git can compare these two snapshots:
git diff --find-renames <hash-of-H> <hash-of-J> # what we changed
Then, using git diff
again, Git can compare H
vs L
, to see what they changed on develop
:
git diff --find-renames <hash-of-H> <hash-of-L> # what they changed
Merge's job is now to combine these two sets of changes. This can include new files being added, and existing files being deleted, though for a more typical merge we might just have one or a few files being "changed by us" and one or a few files being "changed by them".
A merge conflict occurs when:
- both we and they made some change(s) to some file, and
- Git can't combine these two changes on its own.
So if we modify line 42 of file F1, and they modify any lines of any other files without touching file F1, Git will simply take our version of F1, as they didn't change that file. If they modified file F2 and we didn't touch it, Git will just take their version of file F2. If we both touched F3, though, Git will need to combine our changes—to whatever lines we changed—and their changes. If those changes overlap or touch, Git will declare a merge conflict.
Git will also declare a merge conflict if we deleted a file they modified (or vice versa), or and in various other cases that aren't all that important right now. Note that deleting a file is a "whole file" change that will automatically conflict, regardless of which lines the other side modified in that file. But if we deleted a file, and they didn't touch it, Git is fine with that: the combination of our "delete this file", and their "do nothing at all to this file", is to delete the file.
If Git is able to combine our changes and their changes, Git applies these combined changes to the snapshot in the merge base—in commit H
, in this case—and then makes a new merge commit, which we will call M
here. The new merge commit has a snapshot, like any commit. It has metadata, like any commit: you are the author and committer and "now" is used for both timestamps. But, unlike an ordinary commit, the new commit gets two parents. One is the usual: the commit we start from when we run git merge
, which in this case is commit J
. The other is the commit we named on the command line: in this case, commit L
. So the resulting merge commit looks like this:
I--J
/ \
...--G--H M <-- master (HEAD)
\ /
K--L <-- develop
Again, the snapshot in commit M
is the result of combining the H
-vs-J
changes with the the H
-vs-L
changes. If Git was able to combine these changes on its own, Git did so, then applied them to the snapshot in H
. That kept our changes but also added their changes.
Note that moving back from commit M
, we will visit not only commit J
, but also commit L
. So now, all these commits are on the master
branch. The branch gained three commits all at once: new commit M
, but also commits K
and L
, which previously were only on develop
.
Your first issue
Also I get these "delete mode" and "create mode" messages that I don't understand.
This is Git's way of saying that the combination of your changes and their changes includes deleting some file(s)—Git will tell you which file names are being deleted—and creating some other, different files (and again Git will tell you which files). The mode part is the file's mode: either the file is executable (mode 100755
) or it is not (mode 100644
). These are the only two allowed modes.1
You can see why Git believes these files were deleted by running one or both of those two git diff --find-renames
commands yourself. The tricky part here is to find the hash ID of the merge base commit, but Git has a command that does this:
git merge-base --all <name-or-hash-id> <name-or-hash-id>
will do the job. For instance, if you were doing git merge develop
while on master
while master
identified commit J
and develop
identified commit L
, you can find the hash IDs of commits J
and L
and use those two as the arguments to git merge-base --all
. You can then diff this hash ID against the hash ID of commit J
, and again against the hash ID of commit L
.
(Alternatively, if you're willing to take a slight risk of having more than one merge base—the git merge-base --all
command will find out if this is the case, and if so, you need something a little more complicated, but usually it's not the case—you can use the three-dot syntax built into git diff
. For space reasons, I won't go into detail here.)
1In Git repositories created in 2005, there were more allowable file modes. This was discovered to be a bad idea and modern Git generates only these two modes, but git fsck
still permits mode 100664
, for instance, to accommodate these ancient repositories. Remember that no commit can ever be changed, so these commits that contain mode 100664
files cannot be fixed.
Fast-forwards
Sometimes, if you run:
git checkout master
git merge develop
Git will tell you that it did a fast-forward, instead of a merge. What this means is clearer if we once again draw the commit graph. Suppose we start out with:
...--G--H <-- master, develop (HEAD)
and then add some commits to develop
in the usual way:
...--G--H <-- master
\
I--J <-- develop (HEAD)
If we now check out master
and merge develop
, Git will find the merge base in the usual way: by staring from the two branch tip commits H
and J
, and working backwards as needed to find the best shared commit. But this time, after making two steps back from J
, Git reaches commit H
, which is the other commit. So Git can take zero steps back from H
, and hence use H
as the merge base for this merge.
This merge would:
- diff
H
against itself to see what we changed (nothing, of course!); and
- diff
H
against J
to see what they changed; then
- combine nothing with something.
The result of this combining, when applied to commit H
, would obviously match the snapshot associated with commit J
.
Hence, in this situation, Git will by default take a short-cut. It will not bother to merge at all. Instead, it will simply check out the other commit—in this case commit J
—while dragging the current branch name forward, so that we end up with:
...--G--H--I--J <-- master (HEAD), develop
You can force Git to make a real merge, using git merge --no-ff
, which disables the fast-forward short-cut. This time Git really will compare H
against itself, compare H
against J
, and combine the two sets of changes:
...--G--H------K <-- master (HEAD)
\ /
I--J <-- develop
(When and even whether this is useful is somewhat a matter of taste, rather than correctness.)
I think this is the situation you're seeing. See j6t's answer as well, which came in when I was near the end of this.
"Already up to date"
There's one more somewhat-interesting merge case. Suppose you are on some branch, such as master
, and you run git merge develop
and get the message Already up to date.
What this means is that you have a situation that looks like this:
...--G--H <-- develop
\
I--J <-- master (HEAD)
Git computes the merge base as usual, but this time the merge base H
is behind the tip of your current branch master
. It is in fact the tip of the other branch. This also occurs when the two names locate the same commit (e.g., if both names point to H
, or both to J
).
Conclusion
To see what git merge
will do:
- Draw the graph (or have Git draw it for you; see
git log --graph
, often used with --all --decorate --oneline
).
- Locate the merge base commit and the two tip commits.
- See if the merge base is one or both of the tip commits. If so, the merge is trivial (fast-forward-able) or already done.
- Otherwise a true merge is needed. If desired, use
git diff
to see the two sets of changes that will be combined.
- If the sets of changes aren't what you want, inspect the commits that lead up from the merge base to whichever branch tip(s) have some issues in them:
- What went wrong?
- How will you prevent this (or at least deter it) in the future?
- If appropriate, add new commits to one or both branches to fix the problems.
If necessary, consider using git merge --no-commit
to have Git start the merge but not finish it. You can then correct the merge, but note that this produces what some call an evil merge. If you ever have Git repeat this merge,2 you'll have to do the same manual fixups. Or, let Git do the merge, then add a fixup commit. This has the advantage that if you let Git repeat the merge, it will get the same (bad) result, but you can then have Git repeat the fixup.
2The old, now-deprecated git rebase -p
and the newfangled git rebase -r
command will "copy" merges, much like the way any rebase copies ordinary commits, but unlike ordinary commits, git cherry-pick
cannot copy a merge commit. So these work by repeating the merge instead. This repetition does not include any flags you specified when you ran the git merge
, and does not include any manual fixups you made.