TL;DR answer
Make sure your repository is clean (git status
says nothing to commit, etc), and that you are at the top of your work-tree (where the hidden .git
directory lives). Then:
git checkout master
git rm -rf -- . # remove it all: the scary step :-)
git checkout <hash-of-C> -- . # but this puts everything back
git commit
(for the last command, you can optionally add -C <hash-of-C>
again, to re-use C
's log message). Use git log
or similar to find a suitable hash ID for commit C
.
There's a slightly shorter way, and in a lot of cases you don't need the git rm -rf -- .
step at all, but I'll leave that for after the explanation.
There are several other ways to do this, but the above is the most straightforward.
Explanation
Git is all about adding new commits while keeping everything already committed, forever.
Hence, suppose you have a repository with just five commits, which you made by starting with one commit A
:
A <--master
and then adding a second commit B
that looks back at A
:
A <-B <--master
and then a third commit C
:
A <-B <-C <--master
and so on to eventually end up with:
A--B--C--D--E <-- master
In each case here, we've used a single-uppercase-letter ID (A, B, ... Z) for our commits, which means we'd run out after just 26. Git uses those incomprehensible hash IDs instead, so it will never run out of unique IDs for its objects, but the downside is that they're incomprehensible and we have to just cut-and-paste them or whatever. Moreover, Git assigns them; we have no control over the hash IDs; we just make commits and suddenly there's a new hash.
Note also that A
does not point to any earlier commit, because it can't: there is no earlier commit. All other commits, however, point back to their parent. The name master
simply points to the latest commit on branch master
. That, in fact, is how Git knows that it's the latest commit—and how Git finds the earlier commits, too!
Again, Git is all about adding new commits. If you have seen some of the Star Trek episodes with the Borg, I like to call Git the Borg of Source Control: when you commit, it will add your technological distinctiveness to its collective. You run git commit
, Git saves everything in the new commit, and Git makes the current branch (master
, in this case) point to the new commit. The new commit automatically points back to whatever was the tip of the branch before.
What you want, then, is to make a new commit that "looks just like C
" except for one thing: it points back to E
, rather than to C
. All the rest then happens automatically:
A--B--C--D--E--C' <-- master
where the name C'
means "looks and smells a lot like C
, but not quite the same" (because it points back to E
, not to B
—and it probably has a different date-stamp too).
That's all fine for defining the goal, but how do you make this new commit that "looks and smells a lot like C"? Well, each commit has an attached source tree. When you make a commit, Git turns what Git calls the index into the attached source tree. This is why you have to git add
things after you edit them in your work-tree: git add
means "copy the updated version I put in my work-tree, into the index." This overwrites the old, un-edited index version, so now the updated file is ready to commit.
This is a key fact about Git: The index is where you build the next commit you will make.
In normal usage, you just git checkout
a branch, which sets your index and work-tree to match the tip commit of that branch—such as E
for master
. Then you edit some file(s), git add
them to copy them back into this hidden index, and git commit
to make the new commit out of them.
In this case, though, you want to get all your files back the way they were at commit C
. The git checkout
command can do this: instead of:
git checkout <hash-or-branch>
we need the longer form (which ideally should have been a different Git command, but it isn't):
git checkout <hash-or-branch> -- <files>
This tells Git: go look in the hash I gave you (if you give it a branch name, it turns the branch name into a hash ID) and then copy each of its files to my work-tree, writing that file "through" the index.
If your commit C
has six files (README
and five others), each of those files is copied into the index and then on into your work-tree. So now your work-tree README
, and the other files, are updated to match commit C
. They're also already staged for committing: you don't have to git add
them to the index because git checkout
copied them through the index.
The reason we might need to git rm -rf -- .
first is that commit E
might have seven files: maybe in D
or E
, you git add
-ed a file new.ext
that didn't exist in commit C
. If that's the case, git checkout <hash-of-C> -- .
won't remove new.ext
at all. Hence, we do a first pass to empty out the index and work-tree entirely, so that the git checkout <hash-of-C> -- .
re-populates the index and work-tree from commit C
without leaving anything behind from E
.
Now you're ready to commit as usual, making a new commit as usual. Adding the -C
flag to git commit
tells Git to retrieve the initial commit message from an existing commit (it's a bit of eerie or maybe sad coincidence that we're copying a commit we have been calling C
, and using a -C
flag, but it's just coincidence).
Note that if you now git show
this new commit C'
, what Git will do is extract commit E
, then extract C'
, and then diff the two. What you will get is Git's instructions for "how to convert the contents of commit E
into the contents of commit C
". This is another key item about Git: a git diff old new
produces a set of instructions, Git's way of telling you how to convert commit old
into commit new
. It's not necessarily how you did it, it's just some way to do it. When you compare adjacent commits, like A
-vs-B
or D
-vs-E
, you see an approximation of what you did. When you compare distant ones, like A
vs E
, you get Git's description of how to go straight from A
to E
. But each commit itself has a complete snapshot of every file, as it was in the index at the time you ran git commit
.
The slightly shorter way
If you've read this far, you might want to know a short-cut that avoids doing git rm -rf -- .
. Instead of the four commands above, we can use three:
git checkout master
git read-tree --reset -u <hash-of-C>
git commit
The first and last are the same; it's the middle one that's mysterious. The git read-tree
command is an internal Git command—one of what Git calls plumbing commands, meant for use in scripts—that manipulates the index, given particular commit or tree hash IDs (really, anything that Git can convert to a tree hash). The --reset
option means "throw out the current index and replace it with the result of reading". The -u
flag means "update the work-tree based on what happened in the index."
This means that if there are seven files in the index now, but the commit we read has just six, Git will remove the extra file (from the index and, with -u
, the work-tree). So it accomplishes the removal as well as the refilling, all in one step.