Git rebase skipping identical commits

Question

If the upstream branch already contains a change you have made (e.g., because you mailed a patch which was applied upstream), then that commit will be skipped. For example, running git rebase master on the following history (in which A' and A introduce the same set of changes, but have different committer information):
      A---B---C topic
     /
D---E---A'---F master 
will result in:
               B'---C' topic
              /
D---E---A'---F master

What I am not able to understand is what is the need of this optimisation ? If git would have rebased like :

               A''---- B'---C' topic
              /
D---E---A'---F master

what could have gone wrong ? (A = A' = A'')

EDIT : To make myself more clear, Let us say F reverts changes of A'. Now rebase will not apply A patch on F and just B' and C'. In such a case, git rebase behaviour might be unexpected and to me it does not make sense.

torek · Accepted Answer · 2022-07-29T09:09:09.613

7

It is possible to have successive commits that store the same source tree. That's what you would get here, because A'' would introduce no changes to F—those changes are already in F; the diff from E to F includes the same changes due to the diff from E to A'.

In other words, the snapshot in A'' would be the same as the snapshot in F. Git calls this kind of commit pair an empty commit: that is, if the parent matches the child, and the child is not a merge commit, Git says that the child commit is "empty" (even though it still has a full snapshot). The git commit command requires the flag --allow-empty to make such a commit.

Moreover, the commit message in A'' would likely be the same as the commit message in A'. If so, A'' is in all useful ways redundant. Eliminating it is probably the best course of action.

Note that git rebase was originally implemented using git format-patch, which turns each commit into a diff against its parent,¹ with additional text to make the patch email-send-able. A separate program, git am, reads "mailbox format" files (sequences of emailed messages) and applies the patch, using the additional text to reproduce the commit author, author-date, and log message. However, format-patch and git am refuse to work with an empty patch. So the initial version of rebase, which used these two programs to achieve the commit copying, had to drop these "empty" commits.

Modern git rebase has multiple back-ends, not just git am. These can create or copy empty commits. Depending on your particular Git version, you may need to supply the -k flag to git rebase to get rebase to deliberately create such commits. If your version of Git uses git am as a default back-end, git rebase -k forces rebase to use the interactive or merge back-end, rather than the git am back end (and of course also modifies its initial setup to tell it not to drop "empty" commits automatically).

Git version 2.26 and later use a merge-based back end by default, and do not require the -k flag to keep empty commits. Using -k now means keep commits that become empty rather than keep commits that are initially empty.

You can also do your rebase manually, using individual git cherry-pick commands or en-masse cherry-picking using the sequencer. Like rebase, cherry-pick refuses to copy "empty" commits by default, requiring the --allow-empty and/or --keep-redundant-commits flags to keep such commits.

¹git format-patch cannot format merge commits at all, as they have more than one parent.

edited Jul 29 '22 at 09:09

answered Nov 18 '19 at 15:45

torek

448,244
59
642
775

What if F reverts what was in A' ? Then A'' will bring changes ? – Number945 Nov 18 '19 at 16:25
Yes, in that case `A''` would undo the revert. If that's really what you mean, you must force rebase not to drop the commit (this may not be possible), or use cherry-pick directly. – torek Nov 18 '19 at 17:11
So, you are saying that if F reverts A' , still A'' will **not** be applied by git rebase ? – Number945 Nov 18 '19 at 18:19
1

Correct: rebase selects for copying those commits that have no patch-ID equivalent in the symmetric difference, and `A` has a patch-ID equivalent `A'` in the upstream, in your example. This is a mechanism, not a policy, but it meets the goal of an unstated policy: that an upstream acceptance followed by an upstream reversion means your patch is not going to be included. – torek Nov 18 '19 at 18:23
So, torek, then the whole argument of empty commits to justify this git rebase behaviour is not right ? Git developer always had the choice to apply A'' if A'' with respect to its parent would be a non empty commit (as in this case that we discussed in comments ) ? – Number945 Nov 18 '19 at 18:47
1

I think (I wasn't there for this particular history, so I'm guessing) that rebase first just did `git rev-list` to get commits for format-patch (or called format-patch directly), piped the result to `git am -3`, and then someone discovered that upstream cherry picks failed. So then `git rebase` was modified to omit upstream cherry-picks. This was deemed Right and Desirable and has been that way ever since. The fact that it fails (or failed) in implementation has been turned into a design choice. But this is all speculation: it *is* a valid design choice, and meets the policy I described. – torek Nov 18 '19 at 18:49
Forgive me for I might be bugging you little on this torek. I get the idea that there were some implementation constraints. Agreed. Then you state that it meets the policy too. Agreed. But somehow , torek , the policy does not seem to be justified. Just because changes in upstream are reverted , why reject the incoming patch from branch to be rebased just because it has same patch id ? – Number945 Nov 18 '19 at 19:02
2

I didn't make the decision myself, but I find it works pretty well in practice. It's a good idea to be aware of it though, for those particular, peculiar cases where it *doesn't* work well. Always check that the result of a rebase is still good, e.g., passes automated tests. If it fails, remember that you can roll it back with `ORIG_HEAD` or the reflog, and redo it correctly by hand. – torek Nov 18 '19 at 19:21
2

[An earlier take on this conversation](https://stackoverflow.com/questions/44937599/missing-git-commit). – jthill Nov 20 '19 at 15:48
@torek I don't see the `-k` mentioned in the [`git rebase` docs](https://git-scm.com/docs/git-rebase) and supplying it didn't seem to do anything. Is that the correct flag to pass? I do get a hint in my terminal to try `--reapply-cherry-picks`, which resolved the issue for me. Maybe I'm hitting a different scenario than OP, but wanted to update the answer if it was incorrect or had changed. – bdukes Jul 28 '22 at 13:09
1

@bdukes: the old `-k` option applies to the old `git rebase`; the new `git rebase` still has a `-k` but it works a bit differently, as regular rebase already keeps "empty" commits, and `-k` merely forces the rebase to keep a new commit that *becomes* "empty". I updated the answer a bit. – torek Jul 29 '22 at 09:04

score 1 · Answer 2 · answered Nov 18 '19 at 15:24

1

First, it would not be A.... it would be A''.... but the real 'problem' is that the revision would be empty because those changes would be already applied on A'. But you can keep it in history, if so you want.

answered Nov 18 '19 at 15:24

eftshift0

26,375
3
36
60

Romain Valeri · Answer 3 · 2019-11-18T15:51:13.707

1

Your last schema is an impossible state. (Or ar least, it was, but now you edited your question so it's not apparent any more)

A has E for its parent and this cannot change ever. If it did, then A would not be A any more, since a commit is the hashed result of (contents + metadata).

That's the reason A' is used here, by the way.

But to answer the title question, skipping these commits is a way to avoid verbose histories with (potentially, depending on workflows, of course) lots of empty commits.

edited Nov 18 '19 at 15:51

answered Nov 18 '19 at 15:27

Romain Valeri

19,645
3
36
61

why A'' has to be empty ? – Number945 Nov 18 '19 at 15:31
When git rebases commits, it replays its changes on top of the current tree. If it detects the changes as already being here, it doesn't apply anything. Then the resulting commit (`A''`) *is* empty, even if the source (the original commit, `A`) was not. – Romain Valeri Nov 18 '19 at 15:33
`If it detects the changes as already being here, it doesn't apply anything.` I am questioning this behaviour only. Why it has to behave like this ? – Number945 Nov 18 '19 at 15:39
1

Because under normal development conditions what's the point of adding/deleting/modifying code that has already been added/modified/deleted exactly that way before? So... on a line of code you had `i++` and on A you change it to `i+=2`. Great.... you cherry-pick and create A'. That's nice.... now code says `i+=2`. You want to reapply A on top of it.... what do you want it to say? `i+=4`? It will notice that the code already says `i+=2` and git will be happy with it. – eftshift0 Nov 18 '19 at 16:01
@eftshift0 What if F reverts what was in A' ? Now applying A'' makes sense, is not it ? – Number945 Nov 18 '19 at 16:24
1

Right... and **in that case** git will apply the change just as usual and **it won't** complain about the change being empty, – eftshift0 Nov 18 '19 at 16:25
Just for the sake of clarification: git actually doesn't care about the revision being already applied in history... it will take a look at the content of the files to see if what is being requested is already applied. – eftshift0 Nov 18 '19 at 16:26
@eftshift0 So, even if F reverts A' , git rebase wont apply A'' ? If yes, then the whole argument of empty commit to defend this behaviour is not valid anymore. – Number945 Nov 18 '19 at 16:51
1

It _will_ be applied! Git just doesn't know that A' is an analogue to A so it will try to apply A.... under normal conditions, it will be skipped (git will complain about A not being necessary because it is already applied.... not because A' is there but because content would point that way).... _or_, in case you had reverted, the change will be applied expected... but that is a corner case. Don't know what the fuss is about because both things are quite logical. – eftshift0 Nov 18 '19 at 17:06

Git rebase skipping identical commits

3 Answers3

Linked