Unless you do what you are about to do, you only ever add commits to a repository. This means it's really easy to recover. phd's method will work in general except for the current branch, as git branch
is not allowed to alter the hash ID stored in the current branch name. To modify the current branch, you must use git reset
, probably with --hard
, as you suggested.
What to know before you start
Git is not really about branches, and definitely not about files. Git is about commits. We (humans) organize our commits by branch names, and call them "branches"—which makes a mess of things because we call multiple different things "branches", as if they were all the same—and each commit stores files, but the things Git "cares about", as it were, are the commits. So, to use Git, you need to know, at a sort of instinctive gut level, what a Git commit is and does for you. If you don't know this, you'll find yourself in trouble.
A Git commit:
Is numbered. Every commit has a unique number, expressed in hexadecimal; Git calls this a commit's hash ID or, more generally, object ID. These things look random, but are not actually random; they are just not predictable.
When I say unique I mean unique: not "sort of unique", or "unique in isolation", but with the absolute meaning of the word unique, exclusive of meaning numbers 3 and 5. (Meaning 5 here is the weak version of "unique" that descriptivists will include.1) When you make a new commit, it gets a new number that has never been used before, and after that, it can never be used again.
(The numbering scheme Git uses internally is doomed to fail someday. The size of the hash IDs puts this off—we hope for long enough after we're dead and nobody cares, e.g., billions of years—but Git is slowly moving from SHA-1 to SHA-256, because SHA-1 turned out to be insufficient, at least in the case of active sabotage.)
Is read-only. The numbering scheme requires this: the hash ID is actually a cryptographic checksum of the contents. If the contents are to be changed, the result is a new object with a different hash ID, and the original object remains. So we can't—quite—remove anything from a repository. We can only add to it. (We can fake removing something by just no longer using it, which is what we're going to do.)
Holds two things: a full snapshot of every file, and some metadata. I won't go into a lot of detail here but some of this is pretty important, so I'll cover something about the metadata now.
The metadata in each commit is called "metadata" because it is information about that commit. This includes your name and email address (copied from your user.name
and user.email
setting and, once stored in a commit, completely unchangeable like everything about every commit). It includes any log message you'd like to include, to tell yourself and/or others why you made this particular commit. And, for Git's own purposes, every commit holds a list of previous commit hash IDs. This list is usually exactly one element long, and we call that one hash ID the parent of this commit.
It's this parentage information, as stored inside each commit, that forms the history. The result is that the commits are the history: history, in a Git repository, is no more or less than the commits in the repository. But what's crucial at this point is how we find this history. If a commit number looks random—and it does—how will we find the latest commit?
1I myself am both descriptivist and prescriptivist. As in the Alan Parsons song Turn it Up, sitting on fences [makes me] a pain in the ...
Finding the latest commit
Let's draw a simple chain of ordinary (single-parent2) commits, as found in a typical small repository:
... <-F <-G <-H
Here H
stands in for the hash ID of the latest commit. Whatever that actual hash ID is, Git can use it to retrieve the commit, which gets Git both the metadata and the snapshot. Using the metadata, Git can tell you who made the commit and what their log message was. But Git can also use the metadata in H
to find the raw hash ID of H
's parent commit G
.
Using the hash ID for G
, Git can retrieve commit G
. That gets it a snapshot and log message and so on. But that also gets Git the raw hash ID of earlier commit F
. Since F
is a commit, Git can now get it too, and that has a snapshot and log message and a parent, and Git can work backwards from F
, and so on. Repeat this long enough and Git will eventually come back to the very first commit (which, as in footnote 2, won't have any parents, so that git log
can finally stop).
So Git can tell us the entire history of this one-branch repository provided Git can find hash ID H
. Where will it get this hash ID? Are we going to be forced to memorize it ourselves? Can we type it in correctly even if we can remember it?
To save our poor human brains from this task, Git will store the latest hash ID for our one branch in a branch name. We'll pick some name, main
or master
or trunk
or whatever, and have Git store the hash ID of the last commit—the "tip of the branch"—in that name:
...--F--G--H <-- master
If we want to make a new branch, we just have Git create a new name, also pointing to commit H
, like this:
...--F--G--H <-- br1, master
Creating a second br2
branch gives:
...--F--G--H <-- br1, br2, master
Of course, now we need a way to know which name we're using to find commit H
, so we'll have Git attach a very special name, HEAD
, to just one of these three branch names:
...--F--G--H <-- br1, br2, master (HEAD)
This means we're "on" branch master
: git status
will say on branch master
. We're using commit H
to get files, but we're finding commit H
through the name master
.
2A commit with two or more parents, in Git, is a merge commit. At least one commit in any non-empty repository has no parent: that's the first commit anyone ever made. This kind of commits is a root commit. Most commits in most repositories, though, have one parent, and are therefore "ordinary" commits.
Note that a zero-parent root commit, or a two-parent merge commit, still has just a single snapshot of all files. In the case of an ordinary commit, the difference between the parent's snapshot and this commit's snapshot will show what you—or the author of this commit—changed. In the case of a merge commit, there is one difference from one parent, and a different difference from the other parent, so there's no longer a single obvious way to describe what changed. That's what makes merge commits tricky.
Note that the git log -p
command simply does not bother to show a merge commit as "changes", skipping right over the hard part. This is deceptive; watch out for it.
Growing a branch
Let's now switch to branch br1
, with git switch br1
or the old git checkout br1
(both do the same thing). The result is:
...--F--G--H <-- br1 (HEAD), br2, master
We're still using commit H
but now we're doing so through the name br1
.
We now make a new commit in the usual way (edit files, run git add
, run git commit
, enter a log message). I'm going to skip over a huge number of important details about this, specifically, about where all the files are: remember that the files stored in commit H
are literally read-only and therefore cannot be changed, so the files we're editing must not be in commit H
, and in fact they're not in Git at all. But to keep this answer short, we won't go into those details. Instead, we'll just assume that you know all of this, and that when you run git commit
, you know where Git gets the new snapshot. (It's not your working tree.)
In any case, when you do run git commit
and have provided all the details, Git makes the new commit: a new snapshot and metadata. This new commit gets a new, unique hash ID, never to be used again in any Git repository anywhere. It looks random, and is big and ugly and is too difficult for humans, so we'll call this new commit I
, and draw it in:
I <-- ...
/
...--F--G--H <-- ...
Here's the sneaky part: since we are on br1
, Git now updates the name br1
to point to I
. So we can fill in the three dots that appear twice above like so:
I <-- br1 (HEAD)
/
...--F--G--H <-- br2, master
The special name HEAD
remains attached to the name br1
, but the name br1
itself no longer points to H
. Now it points to I
. Commits up through H
are on all three branches, and commit I
is only on br1
.
If we make a second commit on br1
, we get:
I--J <-- br1 (HEAD)
/
...--F--G--H <-- br2, master
We can now git switch br2
(or the same with checkout
as usual) and make new commits that are only on br2
, and they will extend br2
the way our new commits above extended br1
:
I--J <-- br1
/
...--F--G--H <-- master
\
K--L <-- br2 (HEAD)
Now all three names select three different commits: master
continues to select H
as before, but br1
selects commit J
, and br2
selects commit L
.
Here's a capsule review; make sure you understand each of these points:
HEAD
controls which branch name is the current name.
- The current name points to the current commit.
- Switching from one name to another changes which files are checked out of that commit, into the area where we get to work on / with files.
- The files we work on / with, in our work area, are not actually in Git. The snapshots are in Git, and are extracted by
git switch
or git checkout
, which attaches HEAD
to the desired branch name.
- Making a new commit makes a new snapshot-and-metadata, which gets a new, unique hash ID. This new commit points backwards to whichever commit we were using to make the new commit; and as soon as Git has made the new commit, it writes the new hash ID into the current branch name. So the act of making a commit is what grows the branch. The branch that grows is the one we selected earlier, with
git switch
or git checkout
. The parent of the new commit is the commit we selected at that time.
"Removing" a commit
Because of the peculiar way that commits are the history and are read-only, we can't actually remove a commit. But because we find commits by using a branch name to select the latest commit ... well, suppose that instead of having the name "move forward", we force the name itself to "move backward"? That is, suppose we have:
I <-- br1 (HEAD)
/
...--G--H <-- master
and we force Git to move br1
back one commit, to where it was just before we made new commit I
? We'll get:
I ???
/
...--G--H <-- br1 (HEAD), master
Commit I
still exists, but if we go look at history, we won't see it. Whether we ask Git to start from name br1
or from name master
, we'll see commit H
, then commit G
, and so on. We'll never move forward to see commit I
. It's as if it is gone!
Since commits are guaranteed never to change, it's as though commit I
had never occurred.
If we have this:
I--J <-- br1
/
...--F--G--H <-- master (HEAD)
\
K--L <-- br2
and we force br1
and br2
both to go back one step, we'll get this:
I <-- br1
/
...--F--G--H <-- master (HEAD)
\
K <-- br2
(where I simply have not bothered to draw in the commits we can't find any more). The two commits we just "removed" aren't really gone, but they might as well be: we won't see them.
Shrinking is not quite the opposite of growing
When we grow a branch, we do that by:
- checking out the branch by name, which selects the latest commit;
- working on the editable not-in-Git files;
- adding the updated files and committing
and it's the git commit
step that grows the branch. It only grows the current branch.
When we go to shrink a branch, we're allowed to forcibly change any non-current branch name, to make that name select any commit we like (including the current commit). We just run:
git branch -f <name> <new-hash-ID>
For the new hash ID, we can substitute in any expression that is described in the gitrevisions documentation. This includes the name@{time}
formulation, such as master@{"10 minutes ago"}
. If you want to see what commit that would select, use:
git show master@{10.minutes.ago}
or:
git log master@{10.minutes.ago}
(I've used the dot .
instead of a space here to avoid needing quotes, although in some command-line interpreters you might still need quotes anyway; the space-vs-dot trick might not help you the way it helps me.)
But this won't work for any branch we have actively checked out. In a standard Git repository, with only one working tree, that's the one particular branch we're "on"; if you use git worktree add
, there may be additional branch names that are "locked up" like this. To change the branch we're on, we have to use git reset
, and pick one of --soft
, --mixed
, or --hard
. That's because of some of the stuff I skipped over above: we need to tell Git what to do with the usable copies of the files that came out of the commit. Using --hard
tell Git throw out the old usable copies and put in new ones from the new commit I selected, so that's what we'd normally like for this case.
Since the files git reset
will throw out here are not in Git in the first place, Git won't be able to help you get them back after you throw them out. So be very careful with git reset --hard
!