Why doesn't Git rebase take exponential time like Darcs?

Question

So according to the this Wikipedia page about merges in version control, Darcs uses patch commutation and so does Git when it rebases.

I'm curious to know why doesn't Git rebase take exponential time in some cases the same way Darcs does?

This page lists a bunch cases of when it could happen with Darcs and how to try and tackle them.

torek · Accepted Answer · 2019-06-08T01:13:06.653

I have not used darcs and could be off base here, but scanning through the Wikipedia article, I observe that its claim here is misleading:

Patch commutation is used in Darcs to merge changes, and is also implemented in git (but called "rebasing"). Patch commutation merge means changing the order of patches (i.e. descriptions of changes) so that they form a linear history. In effect, when two patches are made in the context of a common situation, upon merging, one of them is rewritten so that it appears to be done in the context of the other.

Git neither stores patches, nor reorders them.¹ You can reorder patches manually if you like. When you use git rebase to convert commits—which are snapshots—into changesets, the conversion itself is done in a simple manner that is worst-case O(nd) where n is the number of lines and d is the length of the edit script. (Whether this is handled as a patch or as a cherry-pick, which is a merge, depends on which rebase algorithm you choose; the merge case tends to use more CPU time as it also looks for treewide rename operations.)

Hence, if you have C total commits, the worst case for a git rebase -i is roughly O(Cnd).² Git is not going to try any other order for the commits: it's up to you to do any reordering. If your rebase fails, you could try all possible reorderings, which would be a factorial function on the number of commits, but Git won't do that for you.

¹If you do a git rebase on a group of commits that has one or more merges inside itself, Git does "flatten" them and eliminate the merges—but it doesn't try many orders, it just does one topological walk of the whole thing. There is no attempt to be smart about it, and often this will lead to terrible merge conflicts, i.e., it's often a bad idea. If you use the new --rebase-merges flag, Git will try to keep the merges, but does so by re-performing them. It just builds an identical topology, using a mini scripting language, and then lets you edit it manually if you like.

²If rename-detection gets involved, that is O(n²) in the number of files that could be rename-detected. Git tries to limit this in several ways, including having a maximum length for the rename queue, with the current default being 4000 files.

I think I understand what you meant but just to double check, what do you mean by "d is the length of the edit script." what is edit script? Also when say handed as a patch you mean just a normal rebase with no rename resolutions? — Yazeed Sabri, Jun 10 '19 at 13:17
(1) The output of `git diff` (or indeed any solution to the [string-to-string edit problem](https://en.wikipedia.org/wiki/String-to-string_correction_problem)) is an *edit script* consisting of operations like "insert I at location 17" and "delete D items at location 30". The length of this edit script is `d` and the length of the input itself is `n`. (2) Yes, that's exactly right: `git rebase` without `-i` or `-m` or various other options makes Git runs `git format-patch` and feed the resulting patch-set to `git am`. — torek, Jun 10 '19 at 15:15

score 3 · Answer 2 · answered Jun 08 '19 at 20:14

3

You can see what's going on in an example I recall from somewhere: start with six lines A B C D E F, on one branch insert three lines G G G at the front then insert six more lines A B C D E F at the front; on another branch (from that first patch/commit) change C to C4. Now merge the two branches.

Git will produce A B C4 D E F G G G A B C D E F; darcs will change the second C, based on the premise that its identity tracking produces a better result than Git's context-and-location tracking. If the "4" is initialization code for the block, Git's result is wrong; if it's initialization code for the function or file, darcs's is wrong.

It's that identity tracking that eats the time.

answered Jun 08 '19 at 20:14

jthill

55,082
5
77
137

I understand the identity part but I didn't get your last two examples stating from "If the "4" is..." – Yazeed Sabri Jun 10 '19 at 00:05
The question no vcs can hope to answer reliably is whether the change properly belongs with the first abcdef block in the file or with the one that has apparently been progressively been shoved farther down. Darcs spends a lot of rigor and effort and cpu calculating the right result for a bad guess. Git's guess is no better, but it gets to its answer a hell of a lot faster. – jthill Jun 10 '19 at 05:15
Another possibility both vcs's miss is of course that the same change should be made to **both** blocks. – jthill Jun 10 '19 at 05:24
1

Question for my own edification: this means that darcs' "merge" operation examines each commit from the merge base to both branch tips? I think that it might be reasonable for Git to do this specifically to look for file renames (perhaps under operator control since it's pretty cpu-expensive). – torek Jun 10 '19 at 15:20
@torek it does. git imerge (on GitHub) does incremental merging, I think that'd do. But I don't think any automated procedure can be completely trusted here, merging code is always going to require judgment, i.e. awareness of what the automatic merge will produce and a human decision about the right result for this merge. Think about all the possible variations on the "should change both" scenario. – jthill Jun 10 '19 at 16:48
Yeah, I agree that no matter how you do a merge, there are perils. Mostly what I'm thinking of for `git merge` here is a scan for 100%-identical renames, so that you can make rename-only commits during major refactorings so as to tell Git: *when matching up files, these are the right ones*. While git-imerge will achieve this (automatically), imerge is painfully slow in a big repository (I tried it on an appropriate situation once). – torek Jun 10 '19 at 17:49
You can merge a separate rename-only branch wherever it's needed, just as with other conflicts Git's tip-merging can't isolate when they're all mixed together it's pretty easy to isolate then for it--and it makes the history easier for humans to read too @torek – jthill Jun 10 '19 at 18:44

Why doesn't Git rebase take exponential time like Darcs?

2 Answers2