1

I know Reset commands contain 3 options -

  1. Hard - which change the files in our working directory to a particular commit id

  2. mixed (default) - which uncommit and un-stage files

  3. soft - which only uncommit files

i know uncommomit -moves the HEAD and the associated branch pointer and does not actually modify the tree of commits but i am not sure what it means? What's the point of moving our head?

phd
  • 82,685
  • 13
  • 120
  • 165
B Luthra
  • 153
  • 9
  • Your point 3, although I understand what you meant, is very oddly formulated. *Undoing commits* rather than *uncommitting files* which in git is nonsensical. (And also, answering your question would entail to rephrase a good chunk of the git-reset man page.) – Romain Valeri Mar 07 '19 at 08:14
  • @RomainValeri https://stackoverflow.com/a/50022436/11154224 i read the answer here and just confused what uncommit means? – B Luthra Mar 07 '19 at 08:18
  • Hard doesn’t _just_ change the working directory. Although it does do that. – evolutionxbox Mar 07 '19 at 08:20
  • @B Luthra "Uncommitting" is a metaphor, but no commit is *undone*. Reset moves pointers around, but commits are here to stay (let's let garbage collection out of the scope for now) – Romain Valeri Mar 07 '19 at 08:22
  • @RomainValeri So what is the point/advantage of just moving pointers around commit ? – B Luthra Mar 07 '19 at 08:26
  • 4
    Possible duplicate of [What's the difference between git reset --mixed, --soft, and --hard?](https://stackoverflow.com/questions/3528245/whats-the-difference-between-git-reset-mixed-soft-and-hard) – phd Mar 07 '19 at 08:53

1 Answers1

2

Fundamentally, the git reset command does too many different things and should not even exist. (This is of course just my opinion. And what it does needs to exist, but probably should be made up of several different plumbing commands, plus at least three or four porcelain commands built atop that. There are actually several porcelain commands, such as git merge --abort, that run git reset. There should just be more.)

Unfortunately, git reset does exist and, because it does many different things, it's also very useful. While useful, it is—at least potentially—destructive. It's a Swiss Army Knife command, but it has blades that won't close and are laden with tetanus, and you need to learn how to hold it carefully so that you don't keep stabbing yourself in the hand and later dying of lockjaw.

In its more basic form, what git reset does is to write to one, two, or three items in your Git repository. To understand this you must first understand how commits and branches work, the function of a Git hash ID, and the roles of HEAD, the index, and the work-tree. Let's start with hash IDs.

Hash IDs

In Git, a hash ID looks like this: b5101f929789889c2e536d915698f58d5c5c6b7a. It's is a big ugly string of letters and numbers. But it's actually a checksum, specifically a cryptographic one, of some data. This means that:

  • it looks random—you can't guess what it will be;
  • it depends on its input data such that changing anything—one bit anywhere in the data, or the order of the bits or bytes in the data—changes it; and yet
  • everyone in the universe can do the same mathematical computations on the same data and arrive at the same hash ID.

This hashing process is used to take any frozen data, such as file contents, and turn them into a unique-to-that-data hash ID. That ID becomes a short-hand name for the frozen data. If I give you the hash ID, you can check to see if you have the data. If I give you just the data, you can compute the hash ID. And, if I give you the hash ID and the data, you can check for yourself to see whether I lied about the hash ID, or gave you the correct pair.

In practice, what this means is that any two Gits can get together and have an abbreviated conversation: Do you have ID X? How about Y and Z? If one Git is missing one of those IDs, the other can give it the data and now it has both the hash ID and the data. If both Gits have all the IDs, they have all the data. So two Gits can very quickly synchronize with each other, with the sender giving to the receiver anything the sender has that the receiver doesn't.

This is already somewhat useful by itself, but when we combine it with commits, it becomes hugely useful.

Commits

In Git, a commit is a read-only entity that saves a snapshot of your files—of all of them, and as a snapshot, not as a set of changes to them—plus some metadata. The metadata is meant to be useful information about the commit: it has the name and email address of the person who made the commit, for instance, plus a time stamp. It also has each hash ID of one or more parent commits.

Because this is read-only data—because it can't be changed—we can compute the hash ID of this commit. This commit is now, forevermore, this commit with this hash ID. None of its data can change: we have the hash ID and that uniquely identifies this data, and no other data can use this hash ID for anything. (For the obvious objection to this, see How does the newly found SHA-1 collision affect Git?)

But, because this commit contains the hash ID of its parent commit as part of its data, all we need to do is make sure that we have every commit in this chain. Say, for instance, we have a commit with hash ID H:

                          <-H

One of the things that's in H is the hash ID of its parent commit. Let's call that parent G. So we make sure that we have that commit too, and we say that H points to G:

                      <-G <-H

Well, one of the things in G is the hash ID of its parent, F. So we make sure we have F too, and it has another hash ID E, which has a hash ID D, and so on, all the way back to the very first commit we ever made in the repository, whose hash ID we're calling A:

A <-B <-C <-D <-E <-F <-G <-H

Since A is the first commit, it has no parent, and we get to stop here.

These arrows are baked into the commits: H always points to G, because the hash ID of G is baked into H and can't be changed and of course H itself is frozen too and its hash ID will never change either. So, for drawing purposes, we can just draw them as connecting lines. Just remember that the arrow itself comes out of the child commit, reaching back to the parent. When we made A we had no idea what the hash ID of B would be, so A literally can't point to B; but when we made B we knew what A was, so B points to A.

This means that given:

A--B--C--D--E--F--G--H

all the internal arrows go strictly backwards. We have to start at H and work backwards. If we start at, say, D, we can go backwards to C, then to B and A, but we literally cannot go forwards to E.

Branches, branch names, and HEAD

All a branch name in Git is, is a human-readable name that holds a—one, single—hash ID. The hash ID that the name holds is the hash ID of the last commit in that branch!

Thus, with the graph above—commits A through H—we can have one or more branch names pointing to any of the eight commits. Let's draw some in:

A--B--C--D--E--F   <-- master
                \
                 G--H   <-- develop

Here, the name master represents commit F by holding the actual hash ID of F. We don't need to remember that hash ID ourselves: we just say master. The name develop remembers the hash ID of H.

From H, we can work backwards through every commit. From F, we can work backwards through most commits—but we can't see H and G from F because that requires going forwards, which is impossible. We desperately need the name develop so that we can find H, from which we find G. After that, as long as the name master still exists, we can find F and earlier commits.

Commits A through H are, in Git, on branch develop, while A through F are on master.

We add a new name, also pointing to F, by doing:

git checkout master
git checkout -b feature

Now we have this drawing:

A--B--C--D--E--F   <-- master, feature (HEAD)
                \
                 G--H   <-- develop

Note that none of the commits changed—which is good because they can't. We just added a new label, feature, also pointing to F.

I've added one other thing to this drawing, which is the special name HEAD, in all capital letters like this. HEAD is how Git remembers which label we are using. We have Git attach our HEAD to one branch name like this, and that's the branch we are "on".

If we now make a new commit, it gets a new and unique hash ID. Let's call this I. We'll look at the process by which we make this new commit in just a moment, but for now let's just say "we made it" and it now exists. The graph now looks like this:

                 I   <-- feature (HEAD)
                /
A--B--C--D--E--F   <-- master
                \
                 G--H   <-- develop

That is, our new commit I points back to F, and the name feature now points to new commit I. From I, we can find F; from F we can find E, and so on. We cannot reach G or H this way. Commits G and H are only on master. Commit I is only on feature. Commits A-F are on all three branches.

Note that the special name HEAD is still attached to the name feature. We have not changed which branch we are on. We have not changed any existing commits at all. We have added one new commit, and we have changed which hash ID is stored in the name feature.

Congratulations! You now understand Git branches. A branch is a series of commits, typically starting from the end and working backwards. You choose how far backwards you wish to go—you can keep going all the way to a root commit, which is one with no parents.

A branch is identified by a branch name that contains the hash ID of the last commit that is part of the branch. Note that when people say the word branch, they often mean branch name rather than string of commits. So the word branch is ambiguous in real life, and you need some context to figure out whether someone means the commits, the name, or both. See also What exactly do we mean by "branch"?

Meanwhile, the special name HEAD, written in all capitals like this, is normally attached to one branch name. Your options are to attach it to a branch name, using git checkout to do that, or to detach it from a branch name, also using git checkout. We won't go into this detached-HEAD mode here, except to say that in this mode, the name HEAD just holds the actual hash ID of some commit. In normal use, though, the role of HEAD is to remember which branch name we are using.

Note that when you use git checkout with a branch name, Git will attach HEAD to the branch name. The commit that the name finds—commit F for master, for instance—is now your current commit. So, to find the current branch name, we ask Git to what name is HEAD attached, and to find the current commit, we ask Git to which commit does HEAD, through some branch name, point? The name HEAD handles both of these operations, depending on which question we ask. We ask "what's the branch name" or "what's the commit hash" and we get the appropriate answer.

The index and the work-tree, or how we build and work with commits

It is now time to look at the roles of the index and the work-tree, in working with Git.

In Git, every repository (except a --bare one) has one index and one work-tree to go with all the frozen commits and the branch names. As you now know, the commits hold snapshots of all of your files. But the stuff in the commit—the metadata about the commit itself, plus the copies of the names and contents of all of the files—all of that stuff is frozen forever. None of it can be changed, not one bit. This is great for archival and exploring the past, but it won't allow us to get any work done.

The frozen contents and names are also compressed, sometimes highly compressed. (There is a trick called delta compression or delta encoding that Git employs later, after making the snapshots. The hash IDs don't account for the delta encoding, so it's invisibly done and undone as needed, and you can pretend Git doesn't have it.) Because they're frozen, they can also be shared: you can have two commits, or ten, or a million, that all have the same copy of some really big file, and in that case, they all share the one compressed copy. This means that since you mostly don't change most files in most commits, most commits mostly just share their frozen copies, without taking extra space.

Again, this is great for archiving—but to get work done, we need a way to thaw out a frozen commit. We need to extract the frozen contents of the frozen file-names. We need to thaw and de-compress the frozen, Git-ified files, into a place where we can work on and with them. That place is the work-tree (or work tree or working directory or some combination of such names).

The work-tree is where you can see your files and work on them. It contains a copy of those files, that Git extracted from some frozen commit—from the commit you selected with git checkout. The selected commit is your current commit. Of course, these files work with all your normal, non-Git computer programs, so you can change them here. You can create new files or remove files. In short, you can work in this directory-tree—this set of folders and sub-folders—and that's why it's your work-tree.

Git could stop here, with one frozen copy of your files, in the current commit, and the second usable and changeable copy of your files in the work-tree. Other version control systems do stop here, but Git is different. Git throws in a third copy of your files. This third copy is what Git calls the index.

The index is how Git keeps track of your work-tree. In fact, the presence of a file in the index is what makes the file tracked. If a file like README.md is in the commit, Git copies it into the index, then copies it from the index into the work-tree. So README.md is now in all three places and because it's in the index, it's tracked.

The files that are in the index are in the special, compressed, Git-only form. They're ready to freeze into a new commit. Git requires that this be true all the time. So if you change a file in your work-tree, Git forces you to use git add to copy the file back into the index, to update the index copy. That re-compresses and Git-ifies the file's contents, so that it's ready to freeze.

What this means is that the index is, in effect, the proposed next commit. Copying files into the index means take what's in the work-tree as the proposed contents.

When git checkout first checks out some commit, to make that the current commit, Git copies the file into the index, so that the proposed new commit has the same file with the same contents as the current commit. You then tweak the file as much as you want in the work-tree, then use git add to copy it back, to update the proposed commit.

That copy-back step is staging the file. If the proposed commit doesn't have the updated file yet, the file is not staged, because if you run git commit right now, that will use the old copy of the file, the one that came out of the commit. Once you've copied the file from the work-tree into the index / staging-area, the proposed commit now has the new copy.

There are therefore three active copies of every file! There's the frozen one in the current commit; there's a second copy in the index; and there's the third copy in the work-tree. You have to use something like git show or some other Git command to see the Git-only copies:

git show HEAD:README.md     # view the frozen current-commit copy
git show :README.md         # view the index copy

Of course you can just use ordinary computer commands to see the ordinary work-tree copy, README.md. But since there are three copies, it's possible for all three copies to be different. Usually, at least two of them are the same, because the index starts out the same as the frozen commit, and then git add makes it the same as the work-tree. But they can all be different.

In any case, when we made new commit I on feature, the way we did it was:

git checkout master
git checkout -b feature
... do some work ...
... run `git add` on our changed files ...
git commit

The git checkout master step attached our HEAD to the name master and extracted commit F into our index and work-tree. The git checkout -b feature step created the new name feature, pointing to commit F, and attached HEAD to this new name. Our index and work-tree still matched the current commit F.

Then, we did some work. This changed files in our work-tree. Then we ran git add on them, to copy them back into the index. Last, we ran git commit. The commit command took in our details—our name and email address, the current time, our log message explaining why we made this commit, and so on. It added the hash ID of the current commit as the new commit's parent. It froze the index to save all of the files. Then it stored all of this data—all the metadata about the commit, plus the frozen files via a tree object with yet another hash ID—into the Git repository as a new commit.

The new commit, with its frozen tree of files and frozen metadata, points back to existing commit F as usual. It has a new hash ID—it's our commit I now—and Git wrote this new hash ID into the current name, using the "which branch name does HEAD have" question. So now the name feature points to new commit I.

Now that we understand all of the above, now we can explain git reset

As I mentioned above, the git reset command, in its three primary forms—git reset --soft, git reset --mixed, and git reset --hard—writes to one, two, or three things.

The three things that git reset writes to—or can write to—are:

  1. the hash ID stored in the current branch name;
  2. the index; and
  3. the work-tree.

The softest kind of reset, git reset --soft, writes to the first part and then stops. That is, it writes a hash ID into a branch name. The index and work-tree are undisturbed.

The mixed kind of reset writes to #1 and #2, and then stops. That is, it writes a hash ID into a branch name, then writes some things to the index. The work-tree doesn't change, though.

The hard kind of reset writes to #1, #2, and #3. That is, it writes a hash ID into a branch name, then writes some things to the index, then writes all over the work-tree too.

So, what we're doing with the hard, mixed, or soft flags is telling git reset when to stop. It will always write to the current branch, using the name HEAD to figure out which branch name that is. It might write to the index, and if it does that, it might write stuff all over the work-tree.

But we also need to look at what git reset writes to each of these, and that's where things get complicated and useful.

You tell git reset which hash ID to store in the branch name

The SYNOPSIS section of the git reset documentation says, in part:

git reset [--soft | --mixed [-N] | --hard | --merge | --keep] [-q] [<commit>]

We're going to ignore --merge and --keep here, and the -N flag. The -q or "quiet" flag just shuts up git reset. The key thing here is the last part, the optional commit. It's optional, but if we leave it out, Git assumes we mean HEAD.

What Git does with this last argument is find a commit hash ID. So if we say, for instance:

git reset --hard b5101f929789889c2e536d915698f58d5c5c6b7a

we've given Git a hash ID, and Git just checks to make sure that it's valid and names a commit. Or we can say:

git reset --hard master

Here, Git takes the name master and translates it to a hash ID. In our example repository above, that would be whatever hash ID commit F has. We can also say:

git reset --hard HEAD~1

which counts back one commit from HEAD; in our example repository, that might start at I and step back one to F, and hence have the same meaning as master here.

In fact, you can use anything that Git can translate to a hash ID. The translation process is described in the gitrevisions documentation. Whatever you use, Git will translate it to a hash ID, and make sure that the hash ID is that of an existing commit.

If you leave it out, Git uses HEAD. That, of course, means whichever commit we have checked out right now. To translate HEAD to a hash ID, Git first finds out which branch name HEAD is attached-to, then finds out what hash ID that branch-name holds.

Now that we have the hash ID, Git writes it into the current branch

Once git reset has the hash ID, it updates the current branch—the one to which HEAD is attached—by writing the hash ID you gave it, into the current name. So if we're on branch feature in our example here, and you run:

git reset --soft master

Git will write the hash ID of commit F into the name feature:

                 I   ???
                /
A--B--C--D--E--F   <-- master,  feature (HEAD)
                \
                 G--H   <-- develop

Note that commit I is still in there. It's just that we can't find it any more. Without a name, we have to find I by going forward from F. We can't go forward; Git doesn't do that. (If we did go forward, wouldn't we arrive at G? Well, maybe. That's a question for another exercise, another day perhaps.) If you've *saved the hash ID of I somewhere, you can now run:

git reset --soft <hash-ID-of-I>

to put things back. If not, well, there are some tricks for finding I's hash ID. Some of them are even pretty easy. One is this:

git reset --soft feature@{1}

because Git automatically saves the old IDs of a branch, in the branch's reflog. (The rest of the ways to find I, we'll leave for other questions.)

More-powerful resets

Now, suppose instead of git reset --soft master, you just run:

git reset --soft

with feature still pointing to I? Then Git will read through HEAD to find the hash ID of I, and as its reset operation, write this commit hash ID back into feature. We'll end up exactly where we started:

                 I   <-- feature (HEAD)
                /
A--B--C--D--E--F   <-- master
                \
                 G--H   <-- develop

Git read I's hash ID out of feature, then wrote that back into feature, leaving feature unchanged. So without a hash ID specifier, git reset --soft has no effect.

What about --mixed? Here, Git will go on to write to the index. Remember, the index is our proposed next commit: it has a copy of each file that we've maybe updated. What git reset --mixed does is that, after it re-sets the current branch using the hash ID, it copies the files from the commit we selected, into the index.

So if we're on commit I like this, and we use:

git reset --mixed HEAD

we're telling Git: Copy the hash ID of feature into feature, leaving feature unchanged; then, after you do that, copy the contents of commit I into the index. The first step doesn't actually change anything, but the second step has the effect of "undoing" any git add we ran.

In other words, by leaving feature unchanged, but re-setting the index, we undid our git adds.

Note that if we had used git reset --mixed master, we'd make feature point to F (not I) and update the index to match commit F (not I)! So git reset is pretty powerful. Whatever we had staged is gone, replaced with what's in the commit. Of course, if we staged by copying from the work-tree, we still have it in the work-tree.

Last, we can use git reset --hard. This does the same first two steps as before. That is, it first writes a new hash ID into the current branch name, which if we didn't say which one to use, is the old hash ID and leaves the branch name unchanged. Then it uses whatever commit the branch name now identifies to re-set the index. Whatever we had staged is gone. Last, it writes to the work-tree.1

This is the most powerful reset, of course, because it clobbers whatever we've changed in the work-tree ... and that's not saved in Git anywhere. The work-tree has our work in it. If we lose that, it's just gone. So in a sense, git reset --hard is the most dangerous case.

But remember how we lost commit I when we carelessly reset, even with --soft, and didn't save the hash ID somewhere first? So even a soft reset is dangerous. Fortunately Git tends to save hash IDs in a lot of places—but this turns out to be almost as much of a problem as not saving them at all. If you need to find a lost hash ID, you might look in the reflogs. But there are often hundreds or thousands of hash IDs in there and they all look totally random. Finding the right one, the needle in the haystack, can be terribly tedious.

In the end, then, git reset is powerful and useful, but also pretty dangerous. Always be careful with it. Be particularly careful with --hard, or anything that changes which commit hash ID is stored in a branch name. The default is to write the hash ID taken from (the name given by) HEAD to (the name given by) HEAD, which does nothing, which makes it safe.


1I'm leaving out here what git reset --hard writes to the work-tree. That's because this part is a bit complicated. For the technically curious, see the git read-tree documentation, but to summarize: while Git is updating the index, the index tells Git which files it should assume are in the work-tree because they're in the index. Those are the files Git can and should clobber. In other words, untracked files mostly go untouched. Tracked files get replaced with their new tracked counterparts from the commit to which you're resetting. But there are some corner cases here: Suppose file X is tracked in the old commit, and isn't in the new commit at all. Then Git should remove X from the work-tree. Likewise, suppose file Y isn't tracked in the old commit, but is in the new one. Then Git should create Y in the work-tree—but there might already by an untracked file Y; should Git clobber it? (Git usually won't, but sometimes a .gitignore entry gives Git permission to clobber file Y.)

torek
  • 448,244
  • 59
  • 642
  • 775