0

I made some mistakes in my amended commits and would like to revert back to a previous version of the commit. I usually just manually do this by manually recreating the previous state, but I would like to do this in an easier way.

This is the commit I am working with: (https://github.com/pytorch/pytorch/pull/73956/commits). There are SHA hashes for each amended commit, so I am wondering how I can revert it to one of these previous versions using the hash?

jinkins
  • 25
  • 1
  • 6
  • `git reset --hard that_SHA` is the usual way to do that. Note that this erases all changes that you do not have committed, yet. – j6t Mar 17 '22 at 14:52
  • @j6t Thanks I will try that. I am also a little confused about these commit hashes. I have the commit checked out locally, and the hash associated with this commit when I do `git log` doesn't match any of the ones in that link. Should it matching the last commit hash in the above link? – jinkins Mar 17 '22 at 14:56
  • I have no idea. I'm not fluent in Github. Superficially, it looks like a regular commit history, not something that is called "amended" in Git lingo. Still, `git reset --hard` should warp you back to an earlier commit. Only that it does not seem to be "amended" in the Git sense. – j6t Mar 17 '22 at 14:59
  • @j6t hmm the `git reset --hard` didn't work. Actually, I think I tried this before and it didn't work before either. I think this might be because we have a specific tool that we use for this codebase that modifies the way pull requests and commits are handled I think. – jinkins Mar 17 '22 at 15:01
  • `git reflog -n 10` will get you the hashes of the previous commits. Then you can `git reset`, `git checkout` or `git switch` as you need. Or restore one or more files from the sha using `git restore --source sha path/file`. You may want to create a (temp) branch on you current commit, or the commit you want to restore. – Martin Mar 17 '22 at 15:57

1 Answers1

0

TL;DR

Martin's comment about using git reflog is spot on: find hash IDs in the reflog, use git log and other Git tools with those hash IDs to see if these are the commits you want, and if so, create a branch there or reset the current branch there or whatever.

Long

Let's see if I can untangle several different issues you may have here. You mention "amended commits", and in Git (but not GitHub), you "amend" a commit with git commit --amend. GitHub have their own command line gh program, which doesn't do this at all, and also some web browswer based methods, about which I know nothing, so let's assume you mean command-line git commit --amend.

There's one crucial bit of background information we should start with: nothing, not even Git itself, can ever change any commit. This means that git commit --amend is a lie: a useful lie, and one you normally need not be particularly aware of, but a lie nonetheless.

A commit, in Git:

  • Is numbered. Each commit has a unique—globally or universally unique—hash ID such as d1fbd59a1d35c1863346b61f5c07671716ebf017 (one of the hash IDs from your link, expanded out to its full gory glory). This hash ID never occurs in any Git repository anywhere unless that repository has this particular commit. So, if your own clone on your laptop has this hash ID in it, it's for this commit. If your GitHub repository has this hash ID in it, it's for this commit. The hash ID is the commit, in a very important sense (though in fact it's just a unique key in a database, and any Git repository is only allowed to use this hash ID to index the corresponding commit: i.e., you have a database of Git objects, and if you have this commit, it has this hash ID, and if you don't have this commit, nothing has this hash ID).

  • Stores two things: metadata and a snapshot. The metadata include things like the name and email address of the person who made it (you), a date-and-time stamp, and—crucially for Git's internal operation—a list of previous commit hash IDs. Most commits have just one entry in this list, and d1fbd59a1d35c1863346b61f5c07671716ebf017 is like that: its previous commit hash ID, stored in the metadata, is 41728d02f906b2845b604d4acd8b1cf1b7b9740f.

The previous commit hash ID links commits together, backwards. That is, suppose the last commit in a chain of commits has some hash ID that we'll just call H. H holds a snapshot (a copy of all files) plus metadata, and in the metadata for H, there is another unique hash ID, which we'll call G. We say that commit H points to commit G:

        <-G <-H

But commit G is a commit, so it also has a list (with one entry) of previous commit hash IDs. So commit G points to an earlier commit, which we'll call F:

... <-F <-G <-H

F is a commit too, so it points to some still-earlier commit, which points to another still-earlier commit, and so on down the line.

This backwards-looking chain is the history in the repository, as found by starting at the end (wherever the last commit in the chain is) and working backwards. But how do we find the actual hash ID of that last commit? (Git needs that hash ID.) Well, we (humans) usually don't bother: we leave that to the computer. Git finds it for us: we give Git a name, such as a branch name, and Git looks up the branch name in a second database, of names-to-hash-IDs. The hash ID stored in the branch name is that of the last commit in the branch.

This is not an accident. It's a literal definition. Whatever hash ID is stored in some branch name, that commit is the last commit in that branch. Its history—its backwards-looking pointers, stored in its metadata—determines which earlier commit(s) are next, and their history—their backwards pointer—determines which earlier commit(s) are next after that, and so on.

In the case of a simple linear chain, where the last commit H just points to a single previous commit G, which just points to a single previous commit F, and so on, we have things pretty easy, so let's assume that for the moment. We run:

git log

while "on" some branch, and Git uses the branch name to find commit H and displays it, then uses the metadata for H to move to G, displays that commit, moves back one more hop to F, displays F, moves back again, and keeps that up until we get tired and quit (usually), or it gets all the way back to the very first commit ever. That commit has no previous commit, so Git must stop here, and does.

We can draw this situation like this:

...--F--G--H   <-- some-branch (HEAD)

That is, you're "on" some branch—internally in Git, this means that the magic file HEAD contains the name of the branch; colloquially, we say that HEAD is attached to the branch name—and the branch name, looked up in the names database, finds hash H for Git, which lets Git look up the commit in the Git-objects database. (A repository is thus mostly these two databases. Cloning a repository copies the objects database one-for-one, but does a funny thing with the names database: you get your own branch names and their branch names become your origin/* remote-tracking names. If they have remote-tracking names of their own, your Git software normally discards those.)

Normally, when we're in this state:

...--G--H   <-- some-branch (HEAD)

and we make some new commit—which we'll call I—Git writes out the new commit so that it points backwards to H:

...--G--H
         \
          I

and then writes the new commit's hash ID into the name some-branch:

...--G--H
         \
          I   <-- some-branch (HEAD)

which we can just draw as a straight line after all. But suppose we somehow con Git into writing out our new commit—let's call itH' this time instead of I—such that its parent is not H, but rather is G? We'll get this:

...--G--H
      \
       H'   <-- some-branch (HEAD)

which we can re-draw as:

       H
      /
...--G--H'  <-- some-branch (HEAD)

Commit H is still in the repository, it just no longer has a name. The branch name some-branch now locates new commit H', and when Git steps back one hop, it moves to commit G, not commit H. So commit H seems to vanish.

If we have memorized its hash ID, though, we'll find that it is still in the repository. Git also secretly (well, not really secretly) records H's hash ID in two reflogs: little side databases with "reflog entries", that remember which hash IDs HEAD used to resolve to—that's the HEAD reflog—and which hash IDs the name some-branch used to contain: that's the some-branch reflog. So if we look in these reflogs, we can find H's hash ID.

This is just what git commit --amend does. It writes a new commit, but instead of setting the new commit's parent to the current commit so that we add to the chain, it sets the new commit's parent(s) to the current commit's parent(s). We'll see the reason for the "(s)" optional plural in a moment.

The commit that was at the end of the branch, just a moment ago, is no longer visible normally, but git reflog will spill out the HEAD reflog, and git reflog some-branch will spill out the some-branch reflog, and we can use that to find H even though "normal" Git operations will only find H' instead. These reflog entries have a creation time and a lifetime: the default lifetime is both 30 days and 90 days (this gets a little complicated) so that gives us a month or so to get old commits back, if we want. We just have to find their hash IDs in the reflogs. (Once the reflog entries expire and are removed, any commit that can't be found is eligible for a true death: git gc, the garbage collector, will eventually clear it out for real.)

Your case is slightly more complicated

I cloned the repository in question and grabbed the PR (#79356)'s head commit:

git clone https://github.com/pytorch/pytorch
cd pytorch
git fetch origin refs/pull/73956/head:pr73956

This lets me see, without GitHub's web browser distortion field,1 what's actually in that repository. Running git log --decorate --oneline --graph pr73956 shows me this:

*   9148dfde3e (origin/gh/dzdang/49/head, pr73956) Update on "[Quant][core][refactorization] Refactored qlinear_unpack.cpp into an implementation file and higher level call registration and definition file"
|\  
| * 57dde4d56a (origin/gh/dzdang/49/base) Update base for Update on "[Quant][core][refactorization] Refactored qlinear_unpack.cpp into an implementation file and higher level call registration and definition file"
* | 6c6e039839 Update on "[Quant][core][refactorization] Refactored qlinear_unpack.cpp into an implementation file and higher level call registration and definition file"
|\| 
| *   db736dfe5e Update base for Update on "[Quant][core][refactorization] Refactored qlinear_unpack.cpp into an implementation file and higher level call registration and definition file"
| |\  
| | * 7ddf212f33 [quant][fx] Fully align convert with the reference model design and simplify the implementation (#73863)
| | * 7070fe4d15 (origin/gh/navahgar/28/base) Automated submodule update: FBGEMM (#74088)
... [snipped]

This tells me that the hash ID of the tip-most commit in the pull request is 9148dfde3e (abbreviated), and that this commit is a merge commit, with two parents instead of just one.

The fact that it's a merge commit does not really change much. We just change our left-to-right drawings so that instead of:

...--F--G--H   <-- some-branch (HEAD)

we have:

...--I--J
         \
          M   <-- some-branch (HEAD)
         /
...--K--L

As you can see, git log --oneline draws this same picture with merge commit M at the top and puts each commit on its own line, with vertical instead of horizontal lines connecting the commits, using a crude ASCII graph. There are lots of ways of viewing the graph (see Pretty Git branch graphs), and it's the graph that's essential.2

When you amend a merge commit—assuming this is what you did—you get a new commit with a new snapshot, and the same parents. For instance, if we amend commit M here, we get a new M', with M shoved up out of the way, and its hash ID more or less forgotten:

          --M   [abandoned, except for reflogs]
         / /
...--I--J /
         X
        / M'  <-- some-branch (HEAD)
        |/
...--K--L

You can put anything you like into the snapshot in M', but note that if the snapshot in M' is not that from git merge—perhaps with conflicts resolved—people call that an evil merge, with the word evil serving as a sort of warning. So it's usually best not to amend merges (except to put in a better commit message perhaps): instead, you might add post-merge fixing commits, for instance.


1GitHub have their opinion on what the right way to show commits is, and it's wrong.

2GitHub hide the graph. That's why their way is wrong. It works OK, or could, if the graph is linear (except they also generally sort by date, which gives you a false picture when there are rebases or clocks are wrong).


Rebase works by copying commits

You can also use git rebase, with or without --interactive and with or without --rebase-merges (-i and -r for short), to copy and replace old commits. This works much like git commit --amend: we can't actually change the old commit, but we can extract it, use it to produce a new-and-improved commit, and copy (and perhaps improve) all the subsequent commits as well. Then we make the branch name point to the last such copy:

...--G--H--I--J   <-- some-branch (HEAD)

becomes:

          I--J   [abandoned]
         /
...--G--H--I'-J'  <-- some-branch (HEAD)

because we used git rebase -i to "modify" commit I to make I' (and had Git "copy" J to J', which Git had to do because the name some-branch can't lead to I' without working backwards through J': existing commit J is stuck forever pointing back to existing commit I; the only "improvement" in J' might be that it points to I', but that still counts).

Again, this even works for branches whose tip commit (or even some interior commit(s)) is a merge commit. Git can't actually copy a merge commit—the git cherry-pick command that copies normal non-merge commits doesn't work here—but it can re-perform the merge, and that's what git rebase -r does. The original commits remain, with their hash IDs find-able in reflogs. But since git cherry-pick (including the one done in a rebase) copies the message of a commit, sometimes only the raw hash IDs work as a guide ... and humans are bad at raw hash IDs.

In all of these cases, with GitHub PRs, git push --force comes into play

When you make a GitHub "pull request", you:

  • pick a repository on GitHub to which you have push access: this may be your own fork, or the original repository itself, depending on what kind of access you have;
  • run git push to create a new branch in that GitHub repository (usually—there are some other workflows here); and
  • use the web interface, or the gh CLI, to make a pull request where you ask someone with access to some GitHub repository—perhaps the original, perhaps your own fork, perhaps another fork: it doesn't really matter, just some connected repository over on GitHub—to use GitHub's interfaces (either the web one, or the gh CLI, or whatever) to do something with the commits you put into whichever repository you have write access to.

GitHub then make sure that those commits—found, as always, by their hash IDs—are available to the person who might accept the PR, and sends them email or otherwise alerts them to the presence of the PR. To do this, GitHub create a refs/pull/number/head ref in the target repository (that's the one you saw in my magic git fetch command near the top). (They may also create a test merge, if they can, which gets the name refs/pull/number/merge, but it's the /head one that really matters here.)

If you then use git commit --amend or git rebase in your laptop clone, you must get those commits to your GitHub fork, or wherever it is that you used git push to send the original sequence of commits. But that sequence of commits is found, in your GitHub fork (or wherever), by the branch name you created. That branch name points to the last commit in the chain of commits that you included in your PR. GitHub won't let you make this branch name point to the new-and-improved commit unless you use --force or --force-with-lease.

Until you do such a git push, the GitHub branch in your fork (or wherever) still points to your original PR commits, not the updated ones. Once you do do such a git push, GitHub will automatically update the refs/pull/number/head name in the GitHub repository in which the PR is still open. They now see your new commits, and your GitHub fork (or whatever) has a branch name that finds the new commits.

Note that GitHub do not give you access to their reflogs (if they even have reflogs, which we don't get to know), so you cannot find your old commit hash IDs that way. The only place to find them is in your laptop (or whatever) repository, where you have access to your own reflogs. These let you get at your own earlier commits, before you made any updates.

Except for garbage collection of un-find-able (i.e., no longer in any reflogs due to entries having expired) commits, your Git objects database is append-only, so any commits you ever made or had are still there, as long as you can find their hash IDs. The hash ID is the commit, or at least, is the "true name" of the commit, by which you (and Git) will find it. If you memorize every hash ID—a foolish undertaking for most humans—you can get them back. If you don't do that, which most don't, you use your reflogs to find the hash IDs.

(Note also that git reflog is really short for git log --walk-reflogs or git log -g for short. This means you can use various git log options by running git log -g instead of git reflog. See the documentation for details.)

torek
  • 448,244
  • 59
  • 642
  • 775