There are two reasons that thinking about what Git does is difficult:
Some of it just inherently is difficult, such as the whole concept of a distributed version control system (multiple separate copies of some repository). But this isn't what you're running into here; for that, we go to:
When people start using Git, they think it's about branches, or files, or something like this. It isn't. Git is all about commits. That includes git merge
: it doesn't really merge branches—well, sometimes it does, depending on what you personally mean when you say the word "branch"—but it always works with commits.
(An adjunct of the second point is that the word branch in Git is ambiguous. See What exactly do we mean by "branch"?)
To think properly about what Git will do here, we need to start with a proper definition of commits. Since you have been using Git for just a little bit, you're probably familiar by now with the sequence:
git checkout <something>
<edit some files>
git add <files>
git commit
and:
git log
But what, precisely, does all of this do? The git log
command shows commits—with big ugly hash IDs—and git commit
command makes a new commit, but what, precisely, is a commit? What are the parts of a commit? How do we name a specific commit? What does one do for us?
Commits
There is this lump of facts we just need to memorize:
Each commit is numbered. The numbers aren't simple counting numbers—they don't go #1, #2, #3, and so on—but they are numbers of a sort, and git log
shows them. They're actually cryptographic checksums of the contents of each commit, and since each commit is guaranteed to be unique,1 each checksum is also unique.2
This means no part of any commit can ever be changed. If you take some existing commit out and modify it in some way, then stuff the new commit back into Git, you get a new (and unique) number for that new commit. The existing commit is still there. You haven't changed it, you have just added a new commit.
Each commit is made up of two parts: a snapshot of all the files that Git knows about, at the time you (or whoever) make the commit, and some metadata such as your name and email address and some date-and-time-stamps.
Because each commit is a full snapshot, Git saves space by keeping the files in the commit in a special, read-only, compressed, Git-ified format in which duplicate files are shared. That is, if some new commit you make has a thousand files, but 999 of them are the same as the files in some previous commit(s), that new commit really only has one new file. It re-uses the other 999. This is quite safe since no part of any existing commit can ever be changed.
Git includes, in the metadata part, the commit number of the previous commit.
1To this end, Git includes the date-and-time-stamps, so that even if you commit the same stuff twice, the two commits are different. Two commits can only be identical if you commit the same stuff at the same time, in which case, did you really commit it twice, or was that just deja vu?
2The pigeonhole principle tells us that this must eventually fail. Git's trick here is to hope that it doesn't fail until the universe ends first. SHA-1 doesn't quite cut it any more, and Git is moving to SHA-256 instead.
Commits are chained, and branch names find the ends
Now, whenever we have the commit number—the hash ID—of a commit or other internal Git object,3, Git can look up that object in its big database. So we say this hash ID, or anything holding this hash ID, points to the commit (or other object). Since each commit holds the hash ID of its earlier predecessor commit, that means each commit points back to an earlier commit. Git calls this earlier commit the parent of the later (child) commit. We can draw this:
... <-F <-G <-H
Here H
stands in for the hash ID of the last commit. Once we actually have commit H
, we can have Git look up the hash ID of earlier commit G
, which lets Git find G
. This provides the hash ID of earlier commit F
, and so on.
What we need, then, is the hash ID of the last commit in the chain. We just need to save that somewhere. We could jot it down on a note, or on the office whiteboard. We could memorize it. Or—hey, we have a computer! Computers are good at remembering stuff. Let's have the computer remember the hash ID of the last commit in the chain.
This is what a branch name does for us: it remembers the hash ID of the last commit. From there, Git works backwards.
Let's draw it like this:
...--F--G--H <-- master
Because commits literally can't change, we don't really need arrows pointing backwards from H
to G
, then G
to F
, and so on. H
is always, forever, going to point backwards to G
. We just need to remember that it's easy for Git to go from H
to G
, but not the other way: children point to their parents, but parents don't know their children, because the parents were frozen in time before they had any children. But the arrows coming out of the names, on the other hand—well, let's add another name such as develop
or feature
or topic
:
...--F--G--H <-- master, topic
Note that both names point to existing commit H
. That's normal in Git: all the commits are now on both branches. But we do, now, need a way to tell which name we're using, and to that end, we'll have Git attach the special name HEAD
to one of these other names:
...--F--G--H <-- master (HEAD), topic
This means we're on branch master
, as git status
would say. If we now use git checkout topic
or git switch topic
,4 we get:
...--F--G--H <-- master, topic (HEAD)
Now we're on branch topic
. We haven't changed commits—we're still using commit H
—but we did change names.
You can use git checkout
or git switch
to select a commit by its hash ID. When you do that, Git uses what it calls detached HEAD mode. Here the name HEAD
isn't attached to some branch name. We won't go into any detail here, but it's useful to know about later, especially with git rebase
, which uses this mode internally.
3There are actually four types of object, which Git calls blobs, commits, trees, and tags. Each gets a hash ID that is unique to that particular object's type-and-content. File content is stored as a blob object, and the fact that a unique blob content gets a unique hash ID is how Git de-duplicates files. It just falls right out of the storage system: if the file content is identical, the blob hash ID is also identical, and therefore the blob is already in the object database. You don't need to know most of this: you just need to know about commit hash IDs.
4The git switch
command was new in Git 2.23. It's an attempt to make git checkout
safer and simpler: the git checkout
command does too many different jobs, so the jobs were split up (sort of roughly halved) and the new commands git restore
and git switch
each do half. The existing git checkout
continues to exist, and will for probably years or decades. The new commands are already growing new abilities which are not being backported into git checkout
, so it's probably best to switch over, but you have years to do this.
Viewing a commit
When we ask Git to show us one of these commits, we don't see a whole snapshot. Instead, we see changes. The way Git does this is remarkably simple. If we're looking at commit H
, whose parent is G
, the git show
or git log -p
command simply extracts both snapshots and compares them. When two files are the same, it says nothing. When two files are different, it figures out what's different, and shows that.
In short, even though commits hold snapshots, we usually see diffs. Git does this by comparing two snapshots. When Git compares a parent to a child, that shows us what happened in that commit; but we can have Git compare any two commits we like.
Making new commits: Git's index and your work-tree
We already noted that the files inside each commit are in a Git-ized and de-duplicated frozen format. I like to call this Git's freeze-dried format for short. Files in this format aren't actually useful for getting any new work done, so when you use git checkout
or git switch
to select some commit, e.g., by selecting a branch name to get on that branch, Git "rehydrates" the files to make useful copies.
The useful files are the ones you can see and work with. They exist in an area that Git calls your working tree or work-tree. Files here have their ordinary everyday form, and all of your computer's regular file-manipulation commands work on them. In fact, these files aren't under Git's control at all, and in an important sense, aren't in Git at all. They are your files, to do with as you will. You just tell Git, whenever you want and using git checkout
, to replace these files with ones from some existing commit.
All version control systems (VCSes) have to have something like this, because no matter how they store files,5 they have the frozen-in-time committed versions, and the usable versions. But Git goes one step further than other VCSes: Git keeps a third copy of each file in something that Git calls, variously, the index, or the staging area, or—rarely these days—the cache. These are three names for the same thing.
Git's index is perhaps best thought of as the proposed next commit. What it actually stores is each file's name, as a long string with embedded slashes—path/to/file.ext
is all just one long file name, for instance; there are no folders or directories in the index—and an internal hash ID for a freeze-dried version of that file. So the "copy" that's in the index is already de-duplicated, and is ready to go into a new commit.
When you use git checkout
or git switch
to extract some particular commit, Git fills both its index and your work-tree from the commit. The index holds the freeze-dried files, and your work-tree holds ordinary files, which your OS insists on naming with directory or folder names and file names. That file named path/to/file.ext
became a directory/folder named path
containing a directory/folder named to
containing a file named file.ext
. Git deals with your OS's peculiarities here, including any conversion from /
to \
that might be required, at the time it does the index-to-work-tree conversion. This is also when it rehydrates the file, and when it does any CR-LF conversion if needed.
What this means is that initially, Git's index matches the current commit. These also match the files in your work-tree, which are now yours. As you modify your files in your work-tree, they stop matching the freeze-dried index copy. This is why Git makes you run git add
all the time. The git add
command tells Git: make your index copy match my work-tree copy. Git will compress and de-duplicate the file at this time, and update its index copy.
5Other systems may store files as deltas, and/or give them numbers—Unix-like inodes or equivalent—to help the VCS identify "same" files. Git does not do any of this, although it does have a level below the Git object level in which Git objects can be "packed" and delta-compressed.
Making new commits: git commit
Now we're ready to see how git commit
really works. It:
- gathers the appropriate metadata for the new commit: your name and email address, a log message to put into the commit, the current date-and-time for the time stamps, and the hash ID of the current commit;
- writes out whatever is in Git's index, to become the new snapshot;
- adds the metadata, and writes that out as the new commit, which provides the hash ID for the commit; and
- (here's the tricky part) writes the new commit's hash ID into the current branch name, i.e., the name to which
HEAD
is attached.
This last step is what updates the branch name. If we had:
...--F--G--H <-- master, topic (HEAD)
just a moment ago, well, now we have:
...--F--G--H <-- master
\
I <-- topic (HEAD)
instead. The current name is still topic
, and HEAD
is still attached there. New commit I
—I
stands in for some big ugly hash ID as usual—points back to existing commit H
. The name topic
now points to new commit I
.
Suppose we now use git checkout master
to go back to existing commit H
. I'm going to move topic
up above the master
line and pretend we added one more commit, too, and rename it topic1
:
I--J <-- topic1
/
...--F--G--H <-- master (HEAD)
What's in Git's index now, and in our work-tree, matches existing commit H
. Let's make a new branch name topic2
now, and switch to it:
I--J <-- topic1
/
...--F--G--H <-- master, topic2 (HEAD)
What's in Git's index and our work-tree has not changed at all. No existing commit has changed (none can), and we're still working with commit H
, but now any new commits we make will change where the name topic2
points. So if we make two commits now, we will get this:
I--J <-- topic1
/
...--F--G--H <-- master
\
K--L <-- topic2 (HEAD)
Merging
Now that we have this fairly complicated setup, we can look at how git merge
does its job. Let's say, for simplicity, that we'll choose to merge topic2
into topic1
, and completely ignore master
for a while. So we'll start by doing a git checkout topic1
or git switch topic1
, and by not drawing master
at all, to get:
I--J <-- topic1 (HEAD)
/
...--F--G--H
\
K--L <-- topic2
Now we'll run:
git merge topic2
Importantly, this kind of operation requires a true merge. I'll show one that doesn't in a moment. A true merge has not two but three inputs, all of which are commits:
Git finds one of them the easy way, using the name HEAD
: that's our commit, or commit J
. This is the --ours
commit for various operations later, although internally this is commit #2 (this internal number can leak out in a few places but --ours
lets us not have to remember it).
Git finds one based on the command we gave it. Since we said git merge topic2
, Git uses branch name topic2
to find commit L
. This is the --theirs
commit for various operations later, although internally this is commit #3.
Git finds the third commit on its own. Git calls this the merge base commit, and internally, it's #1: if we find ourselves wanting to look at files from this commit we need to use this internal number.
The merge base is found using the Lowest Common Ancestor algorithm on a directed graph, but we can think of this as the best shared (common) ancestor on both branches. Here, that means we start at commit J
and work backwards to I
and then H
. We also start at commit L
and work backwards, to K
and then H
. Commits H
and earlier are on both branches, but H
is pretty clearly a better (or at least newer) ancestor than G
, or anything earlier.
What Git does now is compare the snapshot in the merge base to each of the two branch-tip snapshots. That is, Git runs the equivalent of:
git diff --find-renames <hash-of-H> <hash-of-J> # what we changed
git diff --find-renames <hash-of-H> <hash-of-L> # what they changed
Merge's job is now to combine the changes, then apply these combined changes to the snapshot in H
—the merge base:
- If we changed some file, what exactly did we do to that file? If they changed the same file, what did they do?
- If we changed a file and they didn't change a file, Git needs to make the same changes we made. But that means changing what's in
H
to match what's in J
. That's easy: Git can just take our file from J
.
- If they changed a file and we didn't, Git can just take their copy.
- If we deleted a file and they didn't do anything to it, Git can just take the deletion; the same holds if they deleted a file that we didn't do anything to.
If both we and they made some change(s) to some file, and those two changes collide—affect the same lines, for instance—then Git may have to declare a merge conflict. Here, things get a little more complicated:
Suppose that both we and they fixed the same spelling error of the same line of some file. Then our change and their change match, and Git can just take one copy of that change. So that's not a merge conflict after all.
Or, maybe we changed the word red
to the word green
, and they changed the same word to the word yellow
. Here, Git doesn't know which change to take, and declares a merge conflict.
Perhaps we changed a file and they deleted the file. These conflict: Git doesn't know whether to keep our file, or delete the file entirely, so Git declares a merge conflict.
When Git does declare a merge conflict, Git goes ahead and does the remaining merge work wherever it can, but then has git merge
itself stop in the middle. Otherwise—if Git thinks everything went well—it will by default go on to the next step. You can add --no-commit
to your command line to tell Git to stop anyway.
This process, of finding and combining changes using three input commits, is what I like to call merge as a verb. It is the action of identifying (pairing up) input files and making diffs to see what changed, then combining and applying the combined diffs, using all three input files from the three commits.
If all goes well and you didn't tell Git not to, Git will go on to make a new commit. This new commit will be like any other commit, with one exception: instead of one parent, it will have two parents. We'll come back to this in a moment; for now, let's suppose that there's a conflict and the merge stopped, or you used --no-commit
.
Conflicts reveal an important secret ... well, not really a secret, but sometimes not explained very well: the merge-as-a-verb process actually takes place in Git's index, because Git builds new commits from its index. Git does use your work-tree: when the merge stops with a conflict, Git will write, to your work-tree, its best effort at merging the various files. Those that have low-level conflicts6 will contain conflict markers. Git's index, meanwhile, has been expanded: it now holds not one but three copies of each input file. This is where those numbers mentioned earlier come in:
- Index slot #1 holds the merge base copy of the file.
- Index slot #2 holds the
--ours
copy of the file. You can use git checkout --ours
to get this one out to your work-tree.
- Index slot #3 holds the
--theirs
copy of the file. You can use git checkout --theirs
to get this one out to your work-tree.
In high-level conflicts, one of these slots may be empty. I won't go into detail here as this answer is already quite long.
Note too that you can use git checkout -m
to restore the conflicts to your work-tree copy. Be careful with any of these git checkout
operations, as they will instantly overwrite any work you did to fix the merge conflict!
To resolve a merge conflict, you will in general edit the work-tree copy, or use a merge tool (git mergetool
will run your chosen merge tool: Git itself does not come with any so this is strictly for third-party add-ons). Once you have the conflicts resolved correctly, you will usually run git add
to tell Git: make your index copy match my work-tree copy. (The git mergetool
command will run git add
for you, although sometimes it asks first, depending on what third-party tool you use and what Git knows about it.) This git add
wipes out the three numbered slots, and puts an entry in slot #0—whose number you don't normally see—so that there's just a single copy of the merged file in Git's index.
6A low-level conflict is one that takes place within some particular lines of a file. That's the "red became green on one side, but yellow on the other side" example above. A high-level conflict takes place across an entire file, such as when the --ours
side makes a change to a file, but the --theirs
side removes the file entirely. Both kinds of conflicts result in a paused merge, but a high-level conflict leaves no markers in your work-tree.
Merge as a noun or adjective: a merge or a merge commit
If there are no conflicts, or after you have resolved the conflicts and you run git merge --continue
or git commit
, Git is now ready to make a merge commit. This merge commit has a snapshot, just like any other commit. It has metadata, just like any other commit. The only thing that's special about it is that in this metadata, this new commit lists two parent commits.7 We can draw the new merge like this:
I--J
/ \
...--F--G--H M <-- topic1 (HEAD)
\ /
K--L <-- topic2
Note that as usual, the branch name now points to new commit M
. Commit M
points back to existing commit J
, just like any commit. What's special is that commit M
also points back, through a second parent, to commit L
: the one we chose by running git merge topic2
.
Now that commit L
can be found by working backwards from commit M
, Git will allow us to delete the name topic1
. The result looks like this:
I--J
/ \
...--F--G--H M <-- topic1 (HEAD)
\ /
K--L
If there's a branch name master
that points to commit H
, that branch name still points to commit H
: the only things that change are what we tell Git to change, here.
7Technically, this is two or more. The "or more" part is to accommodate Git's so-called octopus merges. These don't do anything you can't do with ordinary two-parent merges, and we won't cover them here.
git merge
commands that don't actually merge
In the above drawings, we had a sort of fork, where our two topic
names pointed to commits that required working backwards to find a common ancestor. But suppose we start with master
like this:
...--H <-- master (HEAD)
and add a branch name feature
and make a commit or two:
...--H <-- master
\
I--J <-- feature (HEAD)
and then run:
git checkout master
git merge feature # or git merge --ff-only feature
This git merge
command will go through the same initial process as our earlier git merge topic2
did, to find the merge base commit. This time, though, when we start at J
and work backwards, we get to commit H
, and commit H
is our current commit. So when we start at master
, we don't actually have to work backwards at all. Commit H
, the merge base, is the current commit. In this case, git merge
says to itself:
Hm, you know, if I compare commit H
to itself, I won't find any changes at all. The result of combining nothing with something is always just the something. So I don't actually need to merge anything at all.
If we don't force Git to make a real merge, it just won't. Instead of merging, it will just check out commit J
, but drag the name master
forward in the process, so that we have:
...--H
\
I--J <-- feature, master (HEAD)
(and now we can straighten out the kink in the drawing).
Git calls this a fast-forward operation. A fast-forward in general means move a branch label forward, against the direction of the internal commit arrows, so that the new position is a child of the current position. When git merge
performs a fast-forward instead of a merge, Git calls it a fast-forward merge, even though no actual merging happened.8
You can prevent this with git merge --no-ff
:
git merge --no-ff feature
will result in:
...--H------K <-- master (HEAD)
\ /
I--J <-- feature
where K
is a new merge commit. The first parent of K
will be H
, and the second parent of K
will be J
. The snapshot in commit K
will match the snapshot in commit J
.
8The other operations that perform fast-forwards all the time are git fetch
and git push
: fetch will fast-forward your own remote-tracking names, and push will often only work if the operation is a fast-forward. When fast-forwarding is not possible for these two, they will use the "force flag", if it is enabled, to force the branch name motion.
Your own cases
I have the following branches:
- Master Branch (currently deployed to Prod)
- Enhancement #1 - CleanupUntrackedFiles Branch: This was spawned off of Master. I made updates to the gitignore file to not include particular extensions in my repo and I removed those unnecessary files from the repo.
- Enhancement #2: This was ALSO created off of Master. Code updates are in support of a customer request.
The phrase "created off master" suggests that you did:
git checkout master
git checkout -b CleanupUntrackedFiles
git checkout master
git checkout -b Enhancement2
or:
git branch CleanupUntrackedFiles master
git branch Enhancement2 master
As we've seen above, these will result in:
...--H <-- master, CleanupUntrackedFiles, Enhancement2
It's the process of making new commits that will cause these two names to diverge, so that they no longer point to existing commit H
.
We do not, however, know whether you did exactly the above, or whether you created these names at different times. For instance, perhaps you made the name CleanupUntrackedFiles
when master
pointed to some earlier commit, and the name Enhancement2
when master
pointed to some later commit:
I--J <-- CleanupUntrackedFiles
/
...--F--G--H <-- master
\
K <-- Enhancement2
Since git merge
works based on commits, and can do fast-forward operations in some cases, these details matter.
Let's assume for now, though, that you have the former setup, i.e.:
I--J <-- CleanupUntrackedFiles
/
...--H
\
K-----L <-- Enhancement2
The difference between the snapshot in H
and that in J
is that in H
, the .gitignore
file doesn't list some files; in J
it has some extra lines; and in J
, files that you listed in .gitignore
are not in the snapshot, so that comparing H
vs J
will show those files as deleted.
Meanwhile, the difference between snapshots H
and L
is that some source files are modified. We don't know, because you did not say, whether any of the files that show up as "deleted" in H
-vs-J
show up as modified in H
-vs-L
.
Let's say you now run:
git checkout master
git merge --no-ff CleanupUntrackedFiles
where the --no-ff
option forces a true merge. You will get:
I--J <-- CleanupUntrackedFiles
/ \
...--H------M <-- master (HEAD)
\
K-----L <-- Enhancement2
The snapshot in M
will match that in J
; the first parent of M
will be H
and the second will be J
. If you now run:
git merge Enhancement2
(no --no-ff
is required; if you use the option, it won't hurt, but it will not change anything) you will—assuming no conflicts—get:
I--J <-- CleanupUntrackedFiles
/ \
...--H------M--N <-- master (HEAD)
\ /
K-----L <-- Enhancement2
where N
is the result of comparing the merge base of M
and L
to the snapshots in M
and L
.
The hard part here is determining which commit is the best shared commit of M
and L
. We must work backwards from each, to find some commit that's on both master
and Enhancement2
. The steps when working back from Enhancement2
are simpler: L
, then K
, then H
, and so on. From master
, we look at M
, then J
and H
(through M
) and I
(through J
) and H
(through I
) in some order, but we do get to H
. So H
is once again the merge base.
Compare what's in snapshot H
vs that in snapshot M
. What has changed? The answer is: the same thing that changed in H
vs J
, because M
's snapshot matches J
's. So the .gitignore
is changed, and some files are deleted.
Now, compare what's in snapshot H
vs that in snapshot L
. What has changed? Some code has changed, as you said. That's not .gitignore
, so there is no conflict there. That might include the deleted files: if so, you'll have a modify/delete conflict to resolve, which you should do by keeping the delete. If it doesn't include the deleted files, there are no conflicts to resolve: Git will take the deletion, as well as the .gitignore
change, from M
, and add the other changes from L
, to make N
.
If you have some other setup, and/or use other options, work through the examples. That will tell you what git merge
will do.