Nothing is erased here, certainly not yet. Adding a commit only ever adds to the repository. Everything you had before, you still have.
Remember what a commit is: it's a complete snapshot of a source tree, along with some additional information: who made the commit and when, a log message, and the identity of some parent commit. Each commit has its own unique name, which is one of those big ugly hash IDs like e0688e9b28f2c5ff711460ee8b62077be5df2360
. If we give this hash ID to Git, Git can retrieve that commit, including its complete-snapshot-of-files exactly as they were when we saved the snapshot.
Branch names like master
and seotweaks
are merely human-readable names that correspond to these big ugly hash IDs. Each name stores one (and only one) such ID. Whenever you make a new commit, whether it is by running git commit
or git merge
, you tell Git to:
- store a new snapshot (we'll worry about where the new snapshot itself comes from later)
- using your name as the author and committer
- and a log message
- and which remembers the previous commit (by hash ID) as its parent, or, for a merge commit, as the first of two parents.
The new commit gets its own new (and unique) hash ID, and Git then stores the new ID in the branch name.
Hence, the files as they were in master
before you did any of this are still in the repository. They're just no longer found by the name master
. You need to find—or tell Git to find—the original hash ID, the one master
had back when you liked the associated snapshot.
How ordinary commits work
For ordinary, non-merge commits, this process is pretty straightforward. Consider a repository with just three commits. Instead of the big ugly hash IDs, I will use one-letter names for them, and call the current (third) commit C
. The name master
stores the ID for C
, so we say that master
points to C
. Commit C
itself stores the ID for earlier commit B
, so we say that C
points to B
:
<--B <--C <--master
Note how these "points-to"s are attached to the commits themselves, so that once we have found C
, we can use C
to find B
. But how do we find C
? One way would be to memorize e0688e9b28f2c5ff711460ee8b62077be5df2360
or whatever, but that's a terrible way. Instead, we use the name, master
, to remember e0688e9b28f2c5ff711460ee8b62077be5df2360
for us.
B
, of course, points to A
. But A
was the very first commit we ever made: where does it point? It's supposed to point to the earlier commit; but there is no earlier commit. So the answer is, it doesn't point anywhere. We call this a root commit. If we draw A
in we have the final, complete picture of the repository's commits:
A <--B <--C <--master
Note again how everything works backwards in Git: we start with the most recent commit, which we find by a branch name like master
. That commit finds an earlier parent commit, which finds another parent, and so on. This process stops only when we hit the root commit, which has no parent (though of course we can get tired and just stop looking :-) whenever we want).
To add a new commit D
to the repository, Git:
- writes out a source-tree snapshot;
- writes a commit with you as author and committer, "now" as the time of the commit, and your log message as the log message; and this new commit points to the current commit
C
;
- takes the new hash ID for the new
D
and stuffs that into the name master
.
Note that the only thing that ever changes here is the name. The new commit, D
, is new. It's not changed, it's new. C
is still exactly as it was before, sitting there pointing to B
. B
is also unchanged, as is A
. Let's draw this new result:
A--B--C--D <-- master
There's no need for the internal arrows: we know that (1) they all point "backwards", and (2) they never change. D
's parent is now C
, and will be forever; C
's parent is still B
; and so on.
This is a way to draw commit graphs, i.e., pictures of what commits find what other commits, and how we find the most recent commits in the first place. You need to learn to draw and view commit graphs, to use Git properly.
How merge commits work
This is how commits work—but it's also how merge commits work, just with a tweak. A merge commit has two parents.1 That is, it "points back" to two earlier commits. Let's draw just part of a more interesting commit graph, with two branches:
...--F--G--H--I <-- master
\
J---K---L <-- sidebr
The way we got this graph was, clearly, that we made a bunch of commits on master
, so that the name master
now points to commit L
. At some point, we did a git checkout -b sidebr
or something along those lines, so that sidebr
pointed to commit F
; but then we made a new commit J
on it, so J
now points back to F
. Then we made K
and L
on sidebr
, so that the name sidebr
points to L
. L
points to K
, which points to J
, which points to F
; master
points to I
, which points to H
, which points to G
, which points to F
.
(Note that we never changed F
. This is one of the reasons Git's internal arrows are all backwards: we don't know, when we first make F
, how many children it will have in the future. But when we bring forth a new child like G
or J
, we know it has one parent, which is F
.)
Now, let's say you git checkout master && git merge sidebr
to make a new merge commit on master
.
The merge process somehow—we'll see more on this in just a moment—combines the two source-tree snapshots for I
and L
, so that we can have a new snapshot for our new merge commit M
. Then it saves that snapshot and makes the new commit as usual. But instead of just pointing back to I
, the new merge commit points back to both old commits:
...--F--G--H--I--M <-- master (HEAD)
\ /
J---K---L <-- sidebr
and now Git changes the name master
so that it points to M
. (The name sidebr
stays the same here.) I put the word (HEAD)
in this time to show which branch is the one we have checked out, since with two branch-names in the diagram, it's no longer obvious.
Incidentally, the term for the commit to which some branch name points, like M
or L
here, is that it is the tip commit of that branch. M
is the new tip of master
, and L
is the tip of sidebr
.
There's another twist to git merge
, which we will get to in a moment, but for now let's just mention that git merge
sometimes—when it can, and when you let it—doesn't actually bother making a new merge commit.
1A merge commit is any commit with two or more parents, technically. But we do not need to worry about the "or more" case.
Now, let's draw what you did with your first merge
You had two branches named master
and seotweaks
and you ran:
git checkout seotweaks
git merge -s ours master
The git checkout
command set HEAD
to seotweaks
(and retrieved the seotweak
tip commit's snapshot from the repository, putting that into the index and work-tree). The git merge
then combined the current commit's tree with that of master
, and now it's time to talk about the "somehow" part of how git merge
combines snapshots.
The -s ours
is what Git calls a merge strategy. A merge strategy is just "instructions for how to combine snapshots". The ours
strategy is the simplest one of all: it says "ignore the other snapshot, just use ours."
In other words, we completely throw away the other commit's work, and just use our own snapshot. "We" are the tip commit of seotweaks
:
...--P--Q <-- seotweaks (HEAD)
...--S--T <-- master
so we keep our source tree snapshot, but make a new merge commit U
. For no really obvious reason, I'm going to draw U
on a line by itself here. U
points back to both Q
and T
, of course, just as before:
...--P--Q
\
U <-- seotweaks (HEAD)
/
...--S--T <-- master
The source tree snapshot for commit U
is exactly the same as the snapshot for commit Q
. But the two are different commits: they have different big ugly hash IDs. The new commit's hash ID is now in seotweaks
.
Now we get to your second merge, for which you ran:
git checkout master
git merge seotweaks
The first step, git checkout master
, swapped out our index and work-tree and current commit from U
to T
. There is no change at all to the graph, we just move HEAD
:
...--P--Q
\
U <-- seotweaks
/
...--S--T <-- master (HEAD)
The second step, though ... well, look at seotweaks
: it's already merged with master
. Instead of making a new merge commit, Git can just "slide the label master
forward" along the backwards arrow going from U
to T
.
Git calls this a fast forward merge, although it's not actually a merge at all. What Git does is, in effect, to check out commit U
and move the name master
to point to U
:
...--P--Q
\
U <-- master (HEAD), seotweaks
/
...--S--T
We now have two branch names for the same tip commit U
. That's quite an ordinary thing to have in Git, and it's one of the many reasons that you need to get good at drawing and interpreting commit graphs.
A side note about git log
When you run git log
, Git:
- looks up
HEAD
to find your current branch;
- uses that branch's tip commit to show you the first commit it shows you;
- uses that commit to find its parent or parents;
- shows you one of the parents just as in step 2;
- repeats forever, showing you all of the parents, in some order.
When HEAD
names master
which names commit U
, git log
first shows you the merge commit, then shows you one of its parent commits, i.e., Q
or T
. Let's say it picks T
. Which commit does it show next? Should it show Q
now? Or should it show S
now?
This is a trick question: the right answer is that there's no right answer. :-) What git log
normally does is to partly sort by commit time stamps, and partly follow other rules. It gives you flags such as --author-date
or --topo-sort
to force it to use some particular order. It also gives you --graph
, which tells it to draw a commit graph (vertically, with newer commits towards the top and older ones towards the bottom, rather than horizontally like I have here), and using --graph
turns on --topo-order
and hence changes the displayed order.
There's never a perfectly right order here because you have to look up both "legs" of the merge at the same time (in parallel), but git log
has to show you one commit at a time, serially. This is yet another reason to get used to drawing and reading commit graphs. As a mnemonic, remember to get help from A DOG: use git log --all --decorate --oneline --graph
. (Some might prefer to use "a god", "all graph oneline decorate". Just remember the dyslexic agnostic, who is not sure whether to believe in Dog.)
How to undo your merge
Now, there are two ways to "undo" a merge. One is a sort of potentially-destructive process, using git branch -f
or git reset --hard
. In this particular case, it doesn't actually destroy anything, which makes this example complicated. To make the example itself less complicated, let's imagine that we somehow first make seotweaks
itself move back to point to commit Q
(and in fact, this is easy to do, using git branch -f seotweaks seoweaks~1
):
...--P--Q <-- seotweaks
\
U <-- master (HEAD)
/
...--S--T
We are still on master
(HEAD = master) and master
still points to commit U
. If we first find the raw hash ID for commmit T
, and then run:
git reset --hard <that-big-ugly-hash-ID>
this will make Git, in effect, check out commit T
and move the branch name master
to point to T
again, giving:
...--P--Q <-- seotweaks
\
U [abandoned]
/
...--S--T <-- master (HEAD)
What happens to commit U
? The answer is in the graph drawing above: we no longer have any names for it, because seotweaks
names Q
and master
names T
. Commit U
is therefore abandoned: thrown to the wolves. Eventually (but not until at least 30 days later, by default), Git will really throw it away. Until then, git log
won't normally show it, because git log
starts at HEAD
—or with --all
, starts at all branch-tip commits—and works backwards from there. Starting from Q
and T
, we will never find U
.
Now, that's how to throw out the commit entirely. If we do that without first modifying the name seotweaks
, the graph looks like this instead:
...--P--Q
\
U <-- seotweaks
/
...--S--T <-- master (HEAD)
and commit U
sticks around, because it has a name by which we can find it. Using git log
from seotweaks
shows us U
, and then Q
or T
first, and then more commits eventually including whichever one of Q
or T
it did not show earlier. If you want to keep U
but force master
back one step so that it forgets U
, this is how to do it.
(Note that you need to find the hash ID of T
. There is a fast way, but I will leave that for another answer.)
Push and force-push
When you git push
, you are connecting your Git, with your repository, to another Git, which has its own, separate repository. You tell your Git to call up this other Git, often over the Internet-phone. You then have your Git give, to the other Git, any new commits you have made, and then have your Git ask their Git to change some of their branch names.
If you add a new commit U
and git push
the result, your Git calls up their Git. Let's say that their Git has the same seotweaks
and master
that you had before all this merging. In other words, they have this graph:
...--P--Q <-- seotweaks
...--S--T <-- master
(there's no need for HEAD
here as we are not using their current commit in any way; we are just asking them to change some of their branch names). We give them new commit U
, which has the hash IDs of Q
and T
in it. They already have those commits Q
and T
(by hash ID) so our Git does not have to hand those over, only U
itself (by hash ID!).
Then, our Git asks their Git to move their master
—and maybe also their seotweaks
, depending on what we tell our Git to ask of them, but let's suppose we only ask them to change their master
—so that their master
points to commit U
:
...--P--Q <-- seotweaks
\
U <-- master
/
...--S--T
They now have the merge commit (by hash ID) and their master
now remembers it (by hash ID).
If we git reset --hard
our own master
so that it points back to T
, and run git push origin master
again, we'll have our Git call up their Git again. Our Git will give them no commits—we have nothing new—and then ask them to set their master
to point to commit T
(again, by its big ugly hash ID).
But they will say no!
If they were to change their master
to point back to T
, that would abandon, in their repository, their commit U
. They don't know—they don't remember—that we gave them U
; all they know and care about is that this would remove U
from master
.
We can turn our polite request, "move your master
", into a command: "move your master
now, or else!" We do this with --force
or -f
. It's still up to them whether to obey the command, but if they do, they will lose their U
.
That might be what we want. But what if someone else, with access to that other Git, has added on a new commit V
:
...--P--Q <-- seotweaks
\
U--V <-- master
/
...--S--T
We now command them: "set your master
to T
!" If they obey, this is the result:
...--P--Q <-- seotweaks
\
U--V [abandoned]
/
...--S--T <-- master
We have not only wiped out our U
commit, but also that V
commit someone else added. So this is the first danger of a force-push.
Another is that someone else—some third person with her own Git and her own repository—may have picked up this commit U
. Even if you convince your other (second) Git to forget commit U
, she—the third person with the third Git—can now re-introduce U
. This is the second danger of a force-push.
The way to handle both of these is to communicate with your team about any force-pushes. Make sure they all know that you are going to do this, that they all agree that it is safe to do so; then do it; then let them all know that it is now done, and they can go back to work as normal.
There is another alternative
We don't have to remove our mistake. We added this new commit U
, but now we decide: Oh, that's bad! We want the snapshot commit at the tip of master
to match the snapshot in commit T
. Instead of trying to remove commit U
, we can add our own new commit V
:
...--P--Q
\
U <-- seotweaks
/ \
...--S--T V <-- master (HEAD)
This graph is, obviously, more complicated—but it has the advantage that we're only adding new commits, not trying to rewind history in some way and discard existing commits.
It's a little tricky to make a new master
commit whose saved snapshot exactly matches that for T
, but only a little: there are a bunch of ways that will do it. The most straightforward (albeit inefficient) is to combine git rm -r -- .
with a subsequent git checkout <hash> -- .
, and then to commit, while on master
and in the top level directory of the repository. The git rm
removes everything from index and work-tree, then the git checkout
re-fills everything from the specified hash.
(You can also revert a merge, which is pretty similar.)
Whether you should do this depends on how good you feel about trying to remove/abandon the merge commit. Can you communicate with the rest of your team? Will they understand all of this? If so, removing the bad merge entirely (by force pushing after reset) is pretty nice: it leaves the graph uncluttered, there are no land mines waiting for a future merge (leaving a bad merge in, with or without a revert of that merge, can make the next merge more difficult). If not, you may have to use the history-preserving, but more-painful, method of backing out the merge manually.