Smarter rebase avoiding redundant work?

Question

One issue I run into with long rebases is having to resolve redundant conflicts. Say I have a branch with a sequence of commits that keeps modifying a function, and the final commit removes the function entirely.

When I do rebase master, Git naively applies each of the commits in turn. That means I need to resolve each of these commits with the tip of master - even though ultimately that work is wasted.

What's a good way to deal with this situation? Perhaps I should just generate a single patch for the whole branch, and apply that against master? If so, is there any way to preserve some history? Thoughts, suggestions, etc.

I think you need to use [git rerere](http://ftp.sunet.se/pub/Linux/kernel.org/software/scm/git/docs/git-rerere.html) but i have no experience with it. — KurzedMetal, May 15 '12 at 13:23
As I understand it, rerere helps if you need to reapply the merges in the future. But I'm trying to avoid even resolving them once. — Steve Bennett, May 15 '12 at 13:34
If you have a feature branch that adds a function that is later removed in the same feature branch, you probably should learn about `git rebase -i` and fix the feature branch before rebasing it after `master`. (Feature branch should contain minimal patches in correct order and interactive rebase helps to acquire that target.) — Mikko Rantalainen, May 22 '19 at 11:11

Mikko Rantalainen · Accepted Answer · 2023-08-14T16:32:04.403

18

You want to use git rerere combined with teaching the rerere database from historical commits using rerere-train.sh (you may already have it at /usr/share/doc/git/contrib/rerere-train.sh). This allows git to automatically use merge conflict resolutions learned from the history.

Warning: you're basically making git rewrite the source code by blindly using historical string replacements to fix the conflicting merge. You should review all conflicting merges after the rebase. I find that gitk works fine for this (it will show only conflict resolution as the patch for merges). I've had only good experiences with rerere, you might not be that lucky. Basically, if your history does contain broken merges (that is, merges that are technically incorrectly done and then later fixed in following commits), you do not want to train rerere from the version history, unless you want to have similarly broken merges done automatically for you. You can still enable rerere and be careful with future merges so you don't teach bad habits to it.

Long story short, you just run

git config --global rerere.enabled 1
bash /usr/share/doc/git/contrib/rerere-train.sh --all

followed by the rebase you really want to do and it should just magically work.

After you have enabled rerere globally, you no longer need to learn from the history in the future. The learning feature is required only for using rerere after the fact the conflict resolution is already done before enabling rerere.

PS. I found similar answer to another question: https://stackoverflow.com/a/4155237/334451

edited Aug 14 '23 at 16:32

answered Aug 23 '12 at 07:18

Mikko Rantalainen

14,132
10
74
112

1

@user230137 could you elaborate a bit more. Did you get some error message or was git unable to magically do the merge for you? Are you sure that the merge conflict was identical to historically done merge? – Mikko Rantalainen Apr 20 '16 at 10:20
git wasn't able to do the merge. How do you know that merge conflict are identical to historically done merge? – Hunsu Apr 20 '16 at 13:12
If you're using `rerere` and merge is not done automatically, then the conflict is not identical. Teaching `rerere` from historical merges only helps if you or somebody else using the same repo have solved identical conflicts earlier. If the merge conflict is not identical, you may have best results with `kdiff3`. Be warned though, that `kdiff3` is trying a bit too much and it sometimes "automatically solves" your conflict without asking for guidance. Always review the changes made with `kdiff3`! The `kdiff3` will be automatically used if you have it installed and run `git mergetool`. – Mikko Rantalainen Apr 21 '16 at 06:04
I'm still baffled by this feature. I enabled rerere in config, then ran rerere-train.sh. Everything seemed to work. Next, I created a test commit on top of HEAD that simply adds a new file (so it shouldn't cause conflicts during a rebase). After that, I ran git rebase -ir to move the test commit right before a merge commit, while preserving merges. Nevertheless, the rebase stopped at the merge commit with the same conflicts as before. Why wasn't the merge commit re-applied? The test commit didn't touch any of the files with conflicts. – John Colvin Jan 12 '23 at 19:04
@JohnColvin It's hard to tell without an example case what's going wrong but I'd guess you're seeing a conflict where you think it's identical but rerere works by comparing strings and those are not actually identical so it doesn't know how to do the merge. In addition, did you use the `--all` flag for `rerere-train.sh`? Without it, will will only learn from the branch you were when you executed the command. – Mikko Rantalainen Jan 13 '23 at 09:26

score 8 · Answer 2 · edited Feb 07 '20 at 19:27

8

You could use git rerere feature.

You have to enable it using git config --global rerere.enabled 1, after that, every conflict you resolve get stored for later use and the resolution is reapplied in the same contexts.

You can check the stored resolutions with git rerere diff.

Take a look at this tutorial for more information.

edited Feb 07 '20 at 19:27

Mohamed Ziata

1,186
1
11
21

answered May 15 '12 at 13:37

KurzedMetal

12,540
6
39
65

1

All useful info, but doesn't really address what I'm asking - which is, how to avoid doing any redundant merging in the first place. – Steve Bennett May 15 '12 at 14:03
Maybe you could use `git rebase -s recursive -X theirs master` to auto-resolve all conflicts by using the original version of the function or `-X ours` to use the new branch function if you don't care, beware, this will be used for all the conflicts, not only in those functions. Sorry i can't help you without more information about the repo, i guess you should've done more integration merges instead of trying to merge several weeks/months of development in one go. – KurzedMetal May 15 '12 at 14:30
Yep, I'm learning lots of "what not to do's" :) Another major cause of angst is having huge source files and huge functions. It turns out to be much harder work to merge changes in this situation. Whereas with small, modularised functions, I think you'd hit fewer merge conflicts and they'd be simpler to resolve. (Not that we didn't know that long functions are bad, but I wasn't aware of this particular reason.) – Steve Bennett May 16 '12 at 08:16
Merging paraller development lines of hairy long functions is going to be hard task regardless if it's done by a computer or by a human. Big functions or methods usually have lots and lots of variables and that is the most problematic part for merging, IMHO. – Mikko Rantalainen Jun 23 '16 at 11:52

score 5 · Answer 3 · answered May 15 '12 at 15:42

5

Why not squash the redundant patches together in an initial interactive rebase (first re-order them so they are together) so that you have cleaned out the 'modify then delete' aspects of the sequence. You can be selective with the hunks within a commit during this stage (e.g. using git gui). This would then give you a better sequence for a final clean rebase.

answered May 15 '12 at 15:42

Philip Oakley

13,333
9
48
71

Yeah, I think I'll try this. It gives me a little bit of history (ie, the commit still comes from somewhere, and it contains all the missing commit messages) but should streamline the process. – Steve Bennett May 16 '12 at 08:13
It's important to make a difference between "squash merge" and the "squash" operation of `git rebase -i`. The former squahes the whole feature branch whereas the latter combines only selected parts. You may need to use `edit`, `squash` and reodering of patches with `git rebase -i` to get very nice results. The learning curve is a bit steep so it may feel hard at first but it will get easier once you fully understand the process. – Mikko Rantalainen May 22 '19 at 11:16

Mikko Rantalainen · Answer 4 · 2023-01-13T09:33:08.543

(This is my second answer to question. On second reading I think the original problem might have been a bit different from the one I first understood.)

I understand the question as you're having a development branch paraller to master. Usually these kind of branch style is called feature branches and I definitely encourage using those.

One should always try to keep feature branches clean. In practice, you want a feature branch that has commits you would had done if you never made any mistakes. For me, that means committing a lot and later git rebase -i to fix the mistakes when I later learn about those mistakes.

By the time your feature branch is ready, it should look like

Add API to do thing X
Fix existing API Y for corner case Z
Add feature B using X and Y (works in case Z, too!)
Improve feature B: do magic stuff E

Instead of

WIP
WIP2
Add API
Move API to do X
Add feature B
On second thought, rename the parameters for X
Fix feature B
Fix APi for X
Fix corner case Z
Fix corner case Z for API Y, too
do magic stuff E
commit missing fILE

If you then rebase your feature branch to latest master branch, the changes are high that only commit Fix existing API Y for corner case Z may cause conflicts. If that commit is minimal change to modify existing API then fixing the conflict should be easy. In addition, that conflict only arises if some other commit has modified exactly the lines touched by your minimal change.

If you do feature branches and rebase feature branches instead of merging (my preferred style is to rebase so that fast-forward is possible and then do git checkout master && git merge --no-ff feature-branch-x and document the whole thing in the merge commit – that allows keeping full history of branch and allows GUI tools to easily navigate around the feature if needed) you definitely want to keep your feature branches clean before rebasing those branches to master. Not only your rebases will be easier but the history is readable in the long run. (Technically this results in a new commit which has logically the same contents as squash merge would have done but its second parent will point to sequence of commits that have the whole history of the feature branch. Those who prefer squash merges can use the first parent and those who prefer having history can follow the second parent. And you can write the commit message for the imaginary squash merge commit in this new commit.)

So in the above example one could rebase -i <old-enough-sha1> and the re-order commits as 3+4+6+8, 10, 1+2+5+7+9, 11+12 where + means squash. Git allows splitting and editing existing commits, too, but it's usually easier to keep commits really small and then squash some of those later. Note that in this example even the original commit number 10 ends up before the original first commit. This is normal and reflects the reality that your implementation was not perfect. That does not need to be stored in version history, though.

In your case, it sounds like you have a feature branch where multiple commits add and remove the same stuff. Squash those commits as a single commit (may end up as no change which is okay). Rebase your feature branch to master only when the feature branch looks clean. Definitely learn to use git gui or some other tool that makes committing changed lines instead of files easy. Every commit should be a change that modifies a sane collection of stuff. If you add a new feature X, the same commit must not fix existing function Y or add missing documentation about Z. Not even if those changes were made to the same file. To me, this is the kind of stuff that Linus Torvalds meant when he said "files do not matter".

Smarter rebase avoiding redundant work?

4 Answers4

Linked