1

I committed some changes which contained a change that I didn't want to commit, so I wanted to remove that commit but keep the staged and unstaged changes that were committed so that I could delete the unwanted change before committing. I used git reset --hard <hash> but it reverted to the commit at HEAD - 1 which not only removed the commit but also removed all staged and unstaged changes before the commit.

Is there any way to reset to a commit but to dump all committed changes (back) to the working tree instead of deleting every change recorded in that commit? In other words, how can I return all committed changes to the working tree?

bit
  • 443
  • 1
  • 7
  • 19
  • 3
    Don't `--hard` reset, the default (`--mixed`) leaves the working tree alone and only resets the index. See https://git-scm.com/docs/git-reset. – jonrsharpe Jan 18 '21 at 10:47
  • @jonrsharpe but does `--mixed` remove the commit? – bit Jan 18 '21 at 10:48
  • None of them really *remove* the commit, it's still in the reflog, just move the HEAD to the specified commit and (depending on hard/mixed/soft) maybe reset the working tree and/or index. – jonrsharpe Jan 18 '21 at 10:50
  • @jonrsharpe thanks. I used `git reset --mixed ` and it deleted the commit but left unstaged changes before the commit alone. – bit Jan 18 '21 at 10:59
  • @jonrsharpe what happens to staged changes, are they also left alone when you do `git reset --mixed `? I'm not sure if working tree refers to just unstaged and untracked files or whether working tree refers to unstaged, staged, and untracked files? – bit Jan 18 '21 at 11:01
  • 1
    Please read e.g. https://stackoverflow.com/q/3528245/3001761, https://stackoverflow.com/q/3689838/3001761 – jonrsharpe Jan 18 '21 at 11:02

2 Answers2

3

First, note that the terms index and staging area mean the same thing. There is also a third term, cache, that now mostly appears in flags (git rm --cached for instance). These all refer to the same underlying entity.

Next, whlie it's often convenient to think in terms of changes, this will eventually mislead you, unless you keep this firmly in mind: Git does not store changes, but rather snapshots. We only see changes when we compare two snapshots. We put them side by side, as if we're playing a game of Spot the Difference—or more precisely, we have Git place them side by side and compare them and tell us what's different. So now we see what's changed, between these two snapshots. But Git doesn't have those changes. It has the two snapshots, and is merely comparing them.

Now we get to the really tricky part. We know that:

  • each commit has a unique hash ID, which is how Git finds that particular commit;

  • each commit stores two things:

    • it has a complete snapshot of every file Git knew about as of the time you, or whoever, made the snapshot; and
    • it has some metadata, including the name and email address of whoever made the commit, some date-and-time-stamps, and so on—and importantly for Git, it has the raw hash ID of some earlier commit(s), so that Git can move back in time, from each commit to its parent;
  • and all parts of any commit are frozen in time forever.

So commits store snapshots, which Git can extract for us to work on. But Git doesn't just extract the commit into a working area. Other version control systems do: they have the commits and the working tree, and that's all there is, and all you need to know about. The committed version is frozen for all time, and the usable version is usable, and changeable. That's two "active" versions and gives us a way to see what we've changed: just compare the active but frozen snapshot to the working one.

But for whatever reason, Git doesn't do that. Instead, Git has three active versions. One active version is frozen for all time, just like always. One active version is in your working tree, just like always. But stuffed in between these two versions, there's a third snapshot. It's changeable, but it's otherwise more like the frozen copy than it is like the useful copy.

This third copy of each file, sitting between the frozen commit and the usable copy, is Git's index, or at least, the part of Git's index you get to worry about.1 You need to know about Git's index, because it acts as your proposed next commit.

That is, when you run:

git commit

what Git will do is:

  1. gather the appropriate metadata, including the hash ID of the current commit;
  2. make a new (though not necessarily unique2) snapshot;
  3. use the snapshot and metadata to make a new, unique commit;3
  4. write the new commit's hash ID into the current branch name.

The last step here adds the new commit to the current branch. The snapshot, in step 2 above, is whatever is in Git's index at this time. So before you run git commit, you have to update Git's index. This is why Git makes you run git add, even for files that Git already knows about: you're not exactly adding the file. Instead, you're overwriting the index copy.


1The rest of it is Git's cache, which normally doesn't get all up in your face. You can use Git without knowing about the cache aspect. It's difficult—maybe impossible—to use Git well without knowing about the index.

2If you make a commit, then revert it, the second commit re-uses the snapshot that you had before you made the first commit, for instance. It's not at all abnormal to wind up re-using old snapshots.

3Unlike source snapshots, each commit is always unique. One way to see why this is the case is that each commit gets a date-and-time. You'd have to make multiple commits in a single second to risk any of them getting the same timestamp. Even then, those commits would presumably have different snapshots and/or different parent commit hash IDs, which would keep them different. The only way to get the same hash ID is to commit the same source, by the same person, after the same previous commit, at the same time.4

4Or, you could get a hash ID collision, but that never actually happens. See also How does the newly found SHA-1 collision affect Git?


A picture

Let's draw a picture of some commits. Instead of hash IDs, let's use uppercase letters. We'll have a simple chain of commits along the main-line branch, with no other branches yet:

... <-F <-G <-H

Here, H stands in for the hash ID of the last commit in the chain. Commit H has both snapshot (saved from Git's index whenever you, or whoever, made commit H) and metadata (name of person who made H, etc). In the metadata, commit H stores earlier commit G's raw hash ID. So we say that H points to G.

Commit G, of course, also has both a snapshot and metadata. That metadata makes earlier commit G point back to still-earlier commit F. Commit F in turn points back still further.

This repeats all the way to the very first commit ever. Being first, it doesn't point back, because it can't; so Git can stop here. Git just needs to be able to find the last commit. Git needs its hash ID. You could type it in yourself, but that would be painful. You could store it in a file somewhere, but that would be annoying. You could have Git store it for you, and that would be convenient—and that's just what a branch name is and does for you:

...--F--G--H   <-- main

The name main simply holds the one hash ID, of the last commit in the chain.

This is true no matter how many names and commits we have: each name holds the hash ID of some actual, valid commit. Let's make a new name, feature, that also points to H, like this:

...--F--G--H   <-- feature, main

Now we need a way to know which name we're using. Git attaches the special name HEAD to one of the branch names, like this:

...--F--G--H   <-- feature, main (HEAD)

We're now "on" main, and using commit H. Let's use git switch or git checkout to switch to the name feature:

...--F--G--H   <-- feature (HEAD), main

Nothing else has changed: we're still using commit H. But we're using it because of the name feature.

If we make a new commit—let's call it commit I—commit I will point back to commit H, and Git will write commit I's hash ID into the current name. This will produce:

...--F--G--H   <-- main
            \
             I   <-- feature (HEAD)

Now if we git checkout main, Git has to swap out our working tree contents and our proposed-next-commit contents. So git checkout main will flip both Git's index and our working-tree contents around so that they match commit H. After that, git checkout feature will flip them back so that they both match commit I.

If we make a new commit J on feature, we get:

...--F--G--H   <-- main
            \
             I--J   <-- feature (HEAD)

The reset command: it's complicated!

The git reset command is complicated.5 We'll only look at "whole commit" reset varieties of the command here—the ones that take --hard, --soft, and --mixed options—and not the ones that mostly do things that we can now do with git restore in Git 2.23 and later.

These "whole commit" reset operations take a general form:

git reset [<mode-flag>] [<commit>]

The mode-flag is one of --soft, --mixed, or --hard.6 The commit specifier—which can be a raw hash ID directly, or anything else that can be converted to a commit hash ID, by feeding it to git rev-parse—tells us which commit we'll move to.

The command does three things, except that you can have it stop early:

  1. First, it moves the branch name to which HEAD is attached.7 It does this by just writing a new hash ID into the branch name.

  2. Second, it replaces what's in Git's index with what's in the commit you selected.

  3. Third and last, it replaces what's in your work-tree with what it's replacing in Git's index too.

The first part—moving HEADalways happens, but if you pick the current commit as the new hash ID, the "move" is from where you are, to where you are: kind of pointless. This only makes sense if you're having the command go on to steps 2 and 3, or at least to step 2. But it does always happen.

The default for the commit is the current commit. That is, if you don't pick a new commit, git reset will pick the current commit as the place to move HEAD. So if you don't pick a new commit, you're making step 1 do the "stay in place" kind of move. That's fine, as long as you don't make it stop there: if you make git reset stop after step 1, and make it stay in place, you're doing a lot of work to accomplish nothing at all. That's not really wrong, but it is a waste of time.

So, now let's look at the flags:

  • --soft tells git reset: do the move, but then stop there. Whatever is in Git's index before the move is still in Git's index afterward. Whatever is in your working tree remains untouched.

  • --mixed tells git reset: do the move and then overwrite your index, but leave my working tree alone.

  • --hard tells git reset: do the move, then overwrite both your index and my working tree.

So, let's say we start with this:

...--F--G--H   <-- main
            \
             I--J   <-- feature (HEAD)

and pick commit I as the place that git reset should move feature, so that we end up with:

...--F--G--H   <-- main
            \
             I   <-- feature (HEAD)
              \
               J

Note how commit J still exists, but we can't find it unless we've saved the hash ID somewhere. We could save J's hash ID on paper, on a whiteboard, in a file, in another branch name, in a tag name, or whatever. Anything that lets us type it in or cut-and-paste it or whatever will do. We can then make a new name that finds J. We could do this before we do the git reset, e.g.:

git branch save
git reset --mixed <hash-of-I>

would get us:

...--F--G--H   <-- main
            \
             I   <-- feature (HEAD)
              \
               J   <-- save

where the name save retains J's hash ID.

The --mixed, if we use it here, tells Git: don't touch my work-tree files at all! This doesn't mean you'll have, in your work-tree, the exact same files that are in commit J, because maybe you were fiddling with those work-tree files just before you did the git reset. The --mixed means that Git will overwrite its files, in Git's index, with the files from I. But Git won't touch your files here. Only with --hard will git reset touch your files.

(Of course, if you run git checkout or git switch: well, those commands are supposed to touch your files too, so that gets more complicated again. But don't worry about that right now, as we're concentrating on git reset.)


5I personally think that git reset is too complicated, the way git checkout was. Git 2.23 split the old git checkout into git switch and git restore. I think git reset should be similarly split up. But it isn't yet, so there is not much point complaining, other than to write this footnote.

6There are also --merge and --keep modes, but they're just further complications that I intend to ignore as well.

7In detached HEAD mode, which I'm ignoring here, it just writes a new hash ID into HEAD directly.


Summary

The default for git reset is to leave your files alone (--mixed). You can also tell Git to leave its own index alone, with --soft: this is sometimes useful when you want to make a new commit that uses what's in Git's index. Suppose you have:

...--G--H   <-- main
         \
          I--J--K--L--M--N--O--P--Q--R   <-- feature (HEAD)

where commits I through Q are all just various experiments, and your last commit—commit R—has everything in its final shape.

Suppose, then, that you wish to make a new commit that uses the snapshot from R, but comes after commit I, and you want to call that the last commit on your (updated) feature. You could do this with:

git checkout feature      # if necessary - if you're not already there
git status                # make sure commit R is healthy, etc

git reset --soft main     # move the branch name but leave everything else

git commit

Right after the git reset, we have this picture:

...--G--H   <-- feature (HEAD), main
         \
          I--J--K--L--M--N--O--P--Q--R   ???

It's now hard to find commits I through R at all. But the right files are in Git's index now, ready to be committed, so the git commit makes a new commit that we can call S (for "squash"):

          S   <-- feature (HEAD)
         /
...--G--H   <-- main
         \
          I--J--K--L--M--N--O--P--Q--R   ???

If you were to compare the snapshot in R to that in S, they would be the same. (Here's another case where Git would just re-use the existing snapshot.) But since we can't see commits I-J-...-R, it now seems as though we've magically squashed all the commits together into one:

          S   <-- feature (HEAD)
         /
...--G--H   <-- main

Comparing S to its parent H, we see all the same changes as we'd see if we compared H vs R. If we never see I-J-...-R again, that's probably just fine!

So git reset --soft is convenient because we get to move a branch name and preserve everything in both Git's index and our work-tree.

In some other cases, we might want to make, say, two commits out of the files that were in R. Here we could let --mixed reset Git's index:

git reset main
git add <subset-of-files>
git commit
git add <rest-of-files>
git commit

This would give us:

          S--T   <-- feature (HEAD)
         /
...--G--H   <-- main

where the snapshot in T matches that in R, and the snapshot in S has just a few changed files. Here, we use the --mixed mode of reset to keep all files intact in our work-tree but reset Git's index. Then we use git add to update Git's index to match part of our work-tree, commit once to make S, and use git add to update the rest of our work-tree and commit again to make T.

So all of these modes have their uses, but to understand those uses, you need to understand what Git is doing with Git's index and your work-tree.

torek
  • 448,244
  • 59
  • 642
  • 775
0

Short answer: I would use git stash.


Long answer: Running git stash will reset the working directory and the index to the current head, by undoing whatever changes you've made to them. It stores a record of these changes in the stash, in a form that's quite similar to a commit.

If you run git status at this point, it ought to show that there are no changes. (Untracked files will still show up. git stash has no effect on untracked files.)

Then you can make whatever changes to the commit history you want, perhaps using git reset, or git rebase. When you're done, run git stash pop. The changes will be retrieved from the stash and reapplied to the index and to the working directory.

You can even run git stash on one branch then switch to another branch before running git stash pop. That's pretty useful if you realize you've been working on the wrong branch.

The previous answer points out that git stores snapshots of files rather than storing changes. But a lot of the time it behaves as if the opposite were true: as if it stored changes rather than snapshots, and that's how git stash pop behaves: it tries to merge changes, rather than simply overwriting one version of a file with another. Note that this means you can get merge conflicts when running git stash pop just like when you run git cherry-pick or git rebase.