It's clearly impossible in general, but it may be easy enough to hit enough cases to make it worthwhile. The following is basically pure theory: there's nothing built in to Git to do this.
Think of commits as snapshots that can be converted to deltas (because they are). Your task is, given a sequence of commits:
...--A--B--C--D--...
(we'll somewhat arbitrarily limit this to linear commits—the mods to the algebra for doing merges are obvious, albeit messy) we're going to compute ∆B, ∆C, and ∆D. ∆B is just B-A, ∆C is C-B, and so on. Of course the result of "subtraction" is a diff-set (or changeset) rather than a simple number. These deltas, however, are what you see when you run git show
(or git diff
). (You may also want to avoid rename detection here if you try to do this for real and make it robust.)
Next, we want to see whether ∆C "includes" -∆B, or whether ∆D "includes" -∆B. A normal git revert
is the negative delta, so we're looking for a changeset that isn't a revert itself, yet still contains a revert.
When we look at the raw form of git diff
output, we find that, for strictly modified files, the diff just a lot of boilerplate-y @@
and hunk-context lines around the change, which expressed exclusively as -
and +
lines: remove old, insert new.
Reversion consists of making the same change in reverse: add the removed lines, while removing the added lines, with the same context, in the same order.
A change that contains a reversion is, at least for the easily detected cases, simply a change that includes the reversion—i.e., adds back removed lines while removing added lines, with the same context, in the same order—while also making some other separate change.
In other words, if ∆C = -∆B+∂C, or ∆D = -∆B+∂D, then we will suspect one of these deltas to be "incorrect": it should have just been ∂C or just ∂D.
The git patch-id
program can compute a patch ID for any diff hunk, so to find out whether some big-delta ∆X is equal to the negative of some previous ∆B plus some smaller ∂X, you could just split up the original ∆B into its component hunks, negate each one, and get its patch ID. [Edit: use André Sassi's suggestion—simplify this by computing the reverse diff initially, either by using B B^ as the two inputs to git diff
, or using the -R
flag, which is available in both git diff
and git show
.] Put these in a master list L. Then, for any subsequent commit X, starting with l=L, walk through the component hunks of ∆X. Compute that hunk's patch ID, and see if that corresponds to the next value that is in your list. If so, drop one item from l. If you drop the last item from l, you have just found that ∆X includes -∆B. Otherwise, move on to the next hunk in ∆X. If you reach the end of ∆X without dropping all values from l, you have concluded that ∆X does not include -∆B.
Repeat for all suspect commits and you'll see if you can automatically find these "accidental reversions". Note that a deliberate reversion will have nothing but -∆B, and will also have a commit message that says "revert ...".