What is the best way to revert a commit in this example?

Question

So, perhaps I am misunderstanding how a revert works in this situation, but lets say we have..

BranchA ....... BranchB .......... BranchC

ALL get merged into releases/05_12_2021

.... THEN

releases/05_12_2021 ---> merged into Development

... THEN Development ------> merge into master

Now, lets say Master has a catastrophic error introduced by BranchB. We can easily roll back master to the previous release. Great. But lets say we want to just get that commit (from BranchB) out of Master.. or if its easier... out of Development (then merge that into master for another release).

Would "git revert -m 1 {hash}" work here?

What is the strategy and how to target that BranchB commit all the way into the release branch that found its way into master?

The question is hard to follow. Do you actually mean branches or commits with A, B and C? In addition, are the merges true merges or fast forwards? `git revert` will apply a single patch in reverse which is logically a new commit that reverts all the changes made by a given commit. It will not fix any other branches automatically. Each branch will need to either cherry-pick or merge the fix (reversion of the problematic patch/commit). You may want to run `git revert {hash-of-the-buggy-commit}` in the Development branch and then merge the change to master via Development. — Mikko Rantalainen, Apr 21 '21 at 14:03

score 1 · Accepted Answer · answered Apr 21 '21 at 15:46

As Mikko Rantalainen notes in a comment, the question itself is a bit ill-defined. However, it makes for a nice springboard for one of my long-form answers. Beware: this is long. If you make it all the way through, though, you'll probably have a lot better understanding of Git.

Would "git revert -m 1 {hash}" work here?

There are many hashes involved, and you don't say which one, so: maybe. Even if you did say which one, the answer would still be "maybe", but at least we'd have a better way to talk about it. :-)

Whenever we deal with Git—or with anything, really—it's important to remember that all abstractions, such as "branches", ultimately have concrete implementations (sometimes just one, sometimes many). Forgetting this leads to what Joel Spolsky called Architecture Astronauts back in 2001 (and here is an updated version, still valid). It's also important to realize that not all concrete implementations work with each other. Once you know what you have—at the actual implementation level—and, crucially, how that works, you can start to tell what will happen. So, let's look at what git revert really does, as well as the abstraction.

The abstraction for revert is "back out some change". That is, given some particular commit—and note how we have already moved from "branch" to "commit" without ever bothering to define either one—we need to somehow view that commit as a change. But once we do bother to check how commits work, well, they aren't changes at all. They contain snapshots: pictures of software, or at least, of files-with-content, frozen in time, in a stream of such pictures. They are like frames on film. This is what gives us the ability to turn a commit into a change: instead of taking just one snapshot, we take two of them, and place them side by side and play a game of Spot the Difference.

Suppose we take two commits that are adjacent in a branch. We still haven't defined branch but we can maybe skate by for now. These two commits, well, Git calls them parent and child. You git checkout some commit (via some branch name), do some work on it, git add, and git commit the result. The git commit step makes a new commit—a new snapshot—that is the child. The old commit is now the parent. The child commit contains the same files as were in the parent, except for those files where you ran git add. For those files, it contains the files you updated. So the parent and child snapshots are the same, except where you made changes and remembered to git add them.

If we compare the parent commit and the child commit—and if we assume you remembered to add all your changes—then our Spot-the-Difference game should, we hope, come up with the changes we made. Certainly, if we did this manually, and didn't ever make any mistakes, and we were the person who made the child commit, we could do it ourselves, and get the right result. But doing all of this is a lot of work, and we're human, and maybe we didn't even make that child commit in the first place.

Fortunately, we can let Git do the work for us. Unfortunately, when we do that ... well, Git isn't smart. Git treats these changes as simple textual differences. Git contains a diff engine (a modified version of the xdiff library) that breaks up a file into individual lines, which it can feed to a string-to-string edit algorithm to come up with a set of changes to apply: make these changes to the left-side file, and you'll get the right-side file. The algorithm itself tries to come up with a minimal change, and this works so often it ought to be surprising. But we get used to it. We do need to remember, though, that it doesn't always work.

This is part of the reason that the answer is still maybe. Your chances are pretty good, but they depend on the diff engine. They also depend on other changes, and whether those interfere.

`git revert` involves three commits

Now that we know how we can compare two commits, let's look more closely at revert, which compares three commits. What three commits are these, and how does that work?

Let's start with the parent-and-child part. In fact, let's approach a definition of branch here. (There is more than one definition, which is one of the problems we have in talking about commits in Git. See also What exactly do we mean by "branch"?) We need one more fact about commits first: while they contain snapshots, they also contain some metadata, by which we mean information about this commit itself. The metadata include things like the name of the author of the commit, and some date-and-time-stamps. For finding the parent commit of some commit, though, the metadata include exactly that: the identity of the parent commit.¹

What this all means is that for ordinary commits—ordinary being an adjective we haven't defined properly yet—there's a single parent associated with the commit. These commits, then, become a sort of string-of-pearls, with each commit pointing backwards to its parent:

... <-F <-G <-H

where H is our latest commit, the newest child (great-great-...-great-grand-child perhaps, depending on where we start in this rather parthenogenic family tree). The letter H stands in for some actual commit hash ID, which is too big and ugly and unwieldy to use here. Commit H has a parent commit, G. Commit G is somebody's child though: G has a parent as well, namely F. This repeats as we go backwards through history, until at last we come upon our Adam, or whatever you want to call commit A: one with no parent. (Git calls it an orphan, sometimes, but a root commit at other times.) That's where we stop, since we have to stop: there's no earlier commit.

Anyway, with all that in mind, if we pick out some pair of commits—say F and G for instance—that have this parent/child relationship, and invoke Git's diff engine on their two snapshots, we will get a diff listing. With any luck, that will be the change that whoever made G had in mind. That's a set of instructions: add some line here, remove some other line there, to file F1, and make corresponding changes to files F2 and F3.

When we run git revert, what we'd like Git to do is to back out these changes. But: back out from what set of files?

If we chose, as our set of files, the files that are in snapshot G, and backed out those changes from that set of files, we'd literally get the same snapshot that's in commit F. Sometimes this is fine and is what we want. That is, suppose we don't have commit H yet—we have made a series of commits that ends at G—and we discover that commit G is bad. We now have two options:

strip G out entirely, leaving us at F; or
add a revert commit that undoes G, giving us a new commit H that, in terms of snapshot anyway, matches commit F.

We can choose the latter, and if we've given commit G out to other people, we often might want to choose the latter because Git is much better at adding commits to a repository than it is at stripping them out.² (If we haven't given out G, stripping G out is fine, of course. It's also fine if we have some mechanism by which we can make sure that all distributed clone copies of this repository have taken it out and won't accidentally restore it.)

Often, though, the point at which we discover that commit G is bad is ... later:

...--F--G--H--...--W   <-- you-are-here

Here we are, on commit W, somewhere far down the line from G, and only now have we discovered some sort of fatal flaw in G. So we'd like to back it out, with git revert. We run git revert hash-of-G, and Git will compare G to its parent F to see what changed. If this is the same G as last time, that's files F1, F2, and F3 (some lines within those files).

That's all fine and good, but here in commit W, those lines might have moved around. Maybe we added code above them, or deleted code above them. They might even have moved into different files.³ Git has a good trick up its sleeve here though: we can compare commit G to commit W. That way, we can find out where any set-of-lines that got added to G is, in W. If some set-of-lines vanished between F and G, we can find where the resulting (post-deletion) set of lines is in W.

What this means is that Git needs to compare the child commit to the current commit, as well as comparing the child commit to its parent commit. This process, in Git, of comparing some particular chosen commit—commit G, the child, in our case—to two other commits, and then combining changes, is what Git calls a merge operation—what I like to call merge as a verb.

¹This leads us to a new problem, which is: what exactly is the identity of a commit? In Git, the answer is: the commit's identity is its hash ID. Sometimes that's not a good identity—it's great for finding the commit, but not useful in some other way—and Git will use another one, which Git calls a patch ID, but that's not the case for git revert. Gerrit, which is not Git but adds on to Git, adds its own Change-ID entity, and other version control systems have other methods to deal with these problems, but Git doesn't do any of that.

²This is something of a general rule in distributed systems. Stripping things out is hard because it's never clear when this is done locally, but adding new things is easy because we attach identities to each "thing" and can tell whether we have all the identities or not. That's why, for instance, distributed key systems have key revocation that consists of adding a "revocation record" rather than (or in addition to) simply removing the key.

³Note that Git does not handle this case very well on its own today: it would need to be considerably smarter. Linus Torvalds argues that Git's design, of using snapshots, makes this easier in the future: once we have compute power and/or algorithms that can figure it out, the snapshots provide exactly what we need to figure it out. There is a good chance that this is true (and even if not, it's almost certainly closer to true than some fancier system we might come up with today). Right now, though, Git can only detect that something switched to another file as a whole-file-rename; if you move part of a file, Git just doesn't get it.

This means a revert is a merge

Viewing git revert as a merge is a bit mind-blowing. It's bad enough that git cherry-pick is a merge (and it is). To make sense out of all of this, we should really illustrate a normal merge. But I don't want to take the time and space to do that here.

Still, the end result is that Git uses its merge engine to achieve git revert. Git compares the child—the commit you pass to git revert, by its hash ID or other means of locating it—to its parent, to see what changed in the commit. Technically this produces a reversed diff, because Git is comparing G on the left to F on the right, in our example, but that's perfect, because applying a reversed diff undoes what the diff did. Rather than blindly trying to apply it to perhaps the wrong places, though, Git also compares the child to the current commit. The merge engine then takes care of transplanting lines as needed, at least within files or across renames (see footnote 3).

This does, however, mean that revert, like cherry-pick, can encounter merge conflicts. If the changes made along the way to the current commit (from the child in a revert, or from the parent in a cherry-pick) clash with the changes needed to undo (revert) or copy (cherry-pick) some commit, you get a merge conflict. You must handle this the same way you handle any merge conflict: by providing Git with the Right Answer.™ That is, for each conflicted file, you must resolve the conflict; if that means changing additional files, so be it; whatever files you give to Git in the end, Git believes that You Have Fixed It and This Must Be ~~Belgium~~ Correct.

In all cases—whether there are merge conflicts, or not—the final commit that git revert makes is an ordinary commit, not a merge commit.

Ordinary commits, root commits, and merge commits

I just used the adjective ordinary in front of the noun commit again, so it's time to define three classes of commits:

A root commit is a commit with no parent. The very first commit you make in a totally-empty repository is, necessarily, a root commit. It's rare to make any more (not impossible, but I won't show how here as there's no real point).
Most commits, in most repositories, are ordinary commits, with one parent. This is in fact the definition of ordinary commit. Any commit with one parent is an ordinary commit.
With those two used up, any commit with two or more parents is, by definition, a merge commit. The git merge command is the usual means by which we make a merge commit. Confusingly, some git merge commands don't make merge commits, but we will conveniently ignore these cases entirely.

Without talking more about how git merge combines changes, let's look at a series of commits where we end up making a merge:

          I--J
         /    \
...--G--H      M   <-- some-branch
         \    /
          K--L   <-- other-branch

Here, commit M is our merge commit: the latest commit on branch some-branch, which we made with git checkout some-branch; git merge other-branch or similar.

Like any commit, M contains a snapshot of all of its files, and some metadata. The metadata tell us who made M (us), when, and so on. The metadata also give the parent hash IDs of two commits, J and L. The hash of J is the first parent, because just before we made M, commit J was the tip of some-branch. The git checkout some-branch is how we decided to use commit J as our current commit, and then git merge other-branch is how we decided to use commit L as the other commit. So the merge M has J first, then L.

Later, we make more commits:

          I--J
         /    \
...--G--H      M--N--O--P   <-- some-branch (HEAD)
         \    /
          K--L   <-- other-branch

We're now using commit P, via the name some-branch.

We've more or less finished defining one version of branch now: a branch, in this definition, is a series of commits with some specific ending point commit. That ending point commit's hash ID is stored in some branch name. The name some-branch identifies commit P, so the series ends at P. The name other-branch identifies (or points to, hence the arrow in the drawing) commit L, still, so there's a branch with commit L at its end. Note that commit L is part of some-branch too: that's a Thing in Git, that commits are often on many branches.

Reverting a merge commit

We can, now, use git revert on commit M, our merge commit from the past. But there is a problem: git revert works by finding the parent of the commit we're going to revert. Commit M has two parents. Which parent should git revert use? That's where your -m 1 comes in:

Would git revert -m 1 hash work here?

What -m 1 does is tell git revert how to deal with the fact that M is a merge commit. Because M is a merge commit, we have to pick which parent the revert should use, in the three-commits-as-input "merge" that achieves the revert operation. So -m 1 here, with the hash ID of commit M, means: use commit J (not L) as the parent.

If we go through the details of merging—which we haven't—we will find that the effect of this is to back out the changes introduced via commit L. Because of the nature of merges, that includes all changes from the bottom row of commits, K and L here. Importantly, though, it won't back out any of the changes that got into M via the top-row of commits (I and J), even if there is a bit of duplication-of-changes in the K-L line.⁴

Reverting a merge has consequences: the merge itself is still in the history. History, in Git, is simply the set of commits you find as you walk backwards through a branch (using the "string of commits ending at some designated commit" definition of branch, here). Because of the way Git stores snapshots and finds merge bases for future merges, this means that the reversion seems like the "right thing" to Git; Git will keep it, and you cannot re-merge the old commits. If you ever find that you want the merge back, the usual way to do that is to revert the revert.

All of this means, though, that the end result is sometimes undesirable. Sometimes it makes more sense to revert, not the merge, but the single bad commit that produced the bad merge. When and whether to do which is a matter of judgment, that Git itself cannot perform. Humans must still choose when to revert the entire merge, or just one bad commit.

⁴Git's merge engine notices duplicate changes and keeps just one copy of them. In this case, that means that when we do the reversed diff from M to I, we don't "see" any dropped duplicate on the L "leg" of the merge, so it does not get backed out. Had we used -m 2 to revert the top line commits, we'd still not "see" the dropped duplicate, so this works both ways.

What is the best way to revert a commit in this example?

1 Answers1

git revert involves three commits

This means a revert is a merge

Ordinary commits, root commits, and merge commits

Reverting a merge commit

`git revert` involves three commits