See the otherwise excellent (and duplicate) Q&A at What's the difference between Git Revert, Checkout and Reset? But we should start with something even more basic.
Why do we use version control?
The purpose of a version control system is to save everything ever done, for all time. Well, except when it isn't: sometimes the purpose is to save some of the things done for all time, some things for some time, and some things for a very short time.
The way that Git saves these is as complete snapshots, which we call commits, that also carry some extra information about the commit, which we call metadata. The metadata includes the name and email address of the person who made the commit, so that we can ask them why they made it, plus a log message, so that they can tell us why they made it without us having to bother them. The metadata in Git also includes the notion of a previous or parent commit. By comparing the parent snapshot to this particular snapshot, Git can tell us what the person who made the commit changed.
With that in mind, we can look at these three Git verbs (I'm going to throw in git checkout
too):
git checkout
is to obtain something done at some time
We use git checkout
to get one particular commit. A commit is a snapshot someone made at some time. Presumably that snapshot was good for some purpose. We use git checkout
to get that snapshot, exactly as it was made at that time, regardless of what our next purpose might be.
In Git, as a side effect of using git checkout
with a branch name, we are now prepared to do new work. But we can also use git checkout
with a raw commit hash, after which new commits are ... well, a bit tricky. (They're still possible, but Git calls this detached HEAD mode and you may not want to use it until you know a lot more about Git.)
The reason that git checkout master
, for instance, works to get the latest commit on master
is that every time we make a new commit on master
, Git automatically updates our name master
so that it means the newest such commit. The newest commit remembers its parent, which used to be the newest. That second-one-back commit remembers its parent, which was the newest when the one-back commit didn't exist either, and so on.
What this means is that the name master
really just find the last commit, from which we find each earlier commit:
... <-F <-G <-H <--master
where each uppercase letter stands in for a commit hash ID. We say each commit points to its parent, and master
points to the latest commit.
git revert
is to back out a bad commit
Given that each commit records its parent, and that Git can therefore tell us what the person who made that commit changed, we can always have Git undo someone else's change (or even our own). We pick a commit, view it as a change—which is how Git shows it to us when we use git log -p
or git show
—and discover that, hey, that change was wrong. That change should be backed out, or “reverted”.1
1The verb revert here is actually a bad choice. The most common English language definition is almost always followed by the auxiliary word to, as in revert to, and it means to return to a former state. But backing out some change doesn't necessarily return us to the old state! We only return to our previous state if we back out the most recent change.
Other version control systems use the verb backout, which is better. In any case, when we use this verb, Git makes a new commit, saving a new snapshot that's just like our previous checkout except that it has someone's change backed out. Well, that is, Git makes this commit unless there's a merge conflict, but we'll ignore that possibility here.
git reset
is ... well, muddled, but we can use it to throw away commits
Git's reset verb is extraordinarily complicated. In one particular form, it does up to three things. With other forms it does other things. The one you've asked about in particular, git reset --hard HEAD~1
, tells Git to:
- Make the current branch name, whatever that is, point to the parent of the current commit.
- Erase the current index—which we haven't described here, but index, staging area, and even cache are really just three names for the same thing in Git—and fill it in from the commit selected in step 1.
- Remove all the work-tree files that went with the index before we reset it, and replace them with copies extracted from the commit selected in step 1 and copied into the index during step 2.
So if we had:
... <-F <-G <-H <--master
we've changed the name master
to point to G
, shoving commit H
up out of the way:
H
/
... <-F <-G <-- master
The commit whose hash is H
is now effectively lost, as if it had never been made. It's still in the repository, it's just become hard to find. In time, if we don't take any other steps to preserve it, commit H
will really go away.
Remember our purpose for commits
We want commits so that they save everything ever done for all time. But sometimes, what we did—like, maybe, make commit H
—was a mistake:
...--F--G--H--I--J--K--L <-- master
If we made H
a while ago and it's all embedded like this, it's hard to remove, because every commit is completely frozen, so to remove H
, we have to copy I
to a new and different commit I'
that has G
as its parent, then copy J
to a new commit that has I
as its parent, and so on:
H--I--J--K
/
...--F--G--I'-J'-K' <-- master
Here it's easier to revert H
, adding a new commit that undoes whatever we changed in H
. Commits I
through K
remain the same—probably slightly broken, but that's how they really were all along—and now we have a new commit L
to undo what we did in H
:
...--F--G--H--I--J--K--L <-- master
But if H
was pretty recent, we can just remove it entirely using git reset --hard
. We'll forget we ever made that mistake. There's no need to tell anyone else.