How do merge and rebase work?

Question

I am trying to understand what merge and rebase do, in terms of set operations in math.

In the following, "-" means diff (similar to taking set difference in math, but "A-B" means those in A but not in B and minus those in B not in A), and "+" means patch (i.e. taking disjoint union in math. I haven't used patch before, so I am not sure).

From Pro Git by Chacon,

What is the role of the most recent common ancestor C2 in this merge?

Is it correct that C5 = (C4-C2) + (C3-C2) + C2?
For rebase instead of merge:

Is it correct that C4' = (C4-C2) + C3?

So do C5 from merge and C4' from rebase have the same content?
From Version Control with Git by Loeliger

How are W', X', Y' and Z' generated?

Heavily related: http://stackoverflow.com/questions/804115/when-do-you-use-git-rebase-instead-of-git-merge — aug, Jan 03 '16 at 00:09

poke · Answer 1 · 2016-01-03T00:29:33.553

You cannot really properly explain Git using that “diff” and “patch” semantic. The reason for that is that Git does not track changes; it tracks content. When you have a commit A with a parent commit B, then for Git, A is not the the difference between B and A, i.e. the changes necessary to get from B to A, but the actual content of A. You could take the commit on its own, and you have everything to reconstruct the repository at that point.

For that reason, I’ll not follow your “commit arithmetic” but try to explain every case in words instead.

C5 is a merge commit with the two parents C3 and C4 which both have C2 as the parent. So assuming that there are no conflicts, and Git is able to resolve the whole merge on its own, then C5 will contain the content expressed by the following two equivalent expressions:
- C3 with the patch applied that would get you from C2 to C4.
- C4 with the patch applied that would get you from C2 to C3.
So in a way, you could formulate it with your formula.
A rebase will essentially put you in exactly the same situation. The only difference is that it actually rewrites the commits it rebases, reapplying their changes (the patch) to another commit. So the result C4 will have the same content as C5 from (1).

With your “arithmetic”, I would generally say that (C3-C2) + C2 = C3. So both formulas are equivalent here too.
As mentioned above, a rebase is just reapplying a commit patch onto another parent. Git is writing new commits that contain the same change as the original one, but those new commits are applied onto a different parent.

So if you wanted to get formulas for that, it would probably look like this:
```
W' = E  + (W - B)        (the patch from B to W, applied on E)
X' = W' + (X - W)        (the patch from W to X, applied on W')
Y' = X' + (Y - X)        (the patch from X to Y, applied on X')
Z' = Y' + (Z - Y)        (the patch from Y to Z, applied on Y')
```
So the base of W changes to be E instead of B, and all follow-up commits are just updated to follow on that new commit W' instead.

But again, the result Z' has the same content as a merge commit that merged E and Z would have.

That all leaves us with a question: What’s the difference between a merge and a rebase if both result in the same content? Since you don’t lose a commit in both ways (rebase will create new commit objects, but those will retain all the original information), basically it’s all about changing how the history looks like:

Merging creates a merge commit that allows to clearly see where the history diverged (where a branch was created) and where it was united again. That’s nice since you can follow the exact development. But it also can become messy, especially if end up merging multiple times of if you have multiple concurrent branch lines.

Rebasing on the other hand makes the history flat. It “fakes” it so that everything is a linear development, that everything came properly after another. Since you are always rebasing a complete branch, you keep related commits together, so you can still see what belongs to each other, but you lose any branching information. Also, you are creating new commit objects which will break everyone’s repository who already knows about those commits objects (which is why you should never rebase commits that have been published).

There are benefits and disadvantages to both ways. It depends a lot on the repository workflow and personal preference.

I've upvoted this particular answer and also [helmbert's](http://stackoverflow.com/a/34572067/1256452). I'm hoping to avoid a big comment discussion here when I add two items: (1) helmbert is correct, merge finds the (or "a") lowest common ancestor node. If there are multiple LCAs the action depends on the merge strategy; the default "recursive" takes two LCAs, merges them to produce a new base, then iteratees merging that base with each remaining base, one at a time. (2) Rebase can be viewed as (and with `--interactive`, *is*) a series of `git cherry-pick` operations internally. — torek, Jan 03 '16 at 05:25
Rebase *may* be a series of `git-cherry-pick` operations, if you invoked it with `git rebase --merge` or `git rebase --interactive`. By default, though, rebase will create a series of patches and then apply them. — Edward Thomson, Jan 03 '16 at 05:31

score 2 · Answer 2 · answered Jan 03 '16 at 00:11

What is the role of the most recent common ancestor C2 in this merge?

Is it correct that C5 = (C4-C2) + (C3-C2) + C2?

The most recent common ancestor is used to determine which changes were made in branch experiment (in your example) that are not yet present in master (the argument being that all commits that are reachable from the branch tip of the experiment branch after the last common ancestor are not yet part of the master branch.

When merging the experiment branch into master with git merge experiment, Git then uses the delta between the common ancestor and the branch tip of the branch to be merged (experiment, in this case) to figure out which changes to apply in the merge commit.

In the set notation that you've chosen for your question, I'd express this as C5 = C3 + (C4 - C2).

Is it correct that C4' = (C4-C2) + C3?

So do C5 from merge and C4' from rebase have the same content?

Both assumptions are correct. Keep in mind that git rebase keeps the delta of a commit intact, so in your example the expression (C4' - C3) = (C4 - C2) is true.

How are W', X', Y' and Z' generated?

As said, git rebase keeps the delta of the commit. This means that the difference between (for instance) W and its parent commit B (or W - B, which you can also generate with git diff B..W).

A rebase then simply applies the same delta on a new parent commit; so in your example (W' - E) = (W - B) (and accordingly for X, Y and Z).

In the last example, does rebase work like: W' = (W-B) + E, and X' = (X-B) + E or X' = (X-B) +W'? — Tim, Jan 03 '16 at 00:30

score 1 · Answer 3 · answered Jan 03 '16 at 00:13

What is the role of the most recent common ancestor C2 in this merge?

There are 2 parents for c2 c3+c4

The whole point of rebase instead of merge is that you will have a "straight" line instead of all those forks (snake like) on the way.

You take your work and you place it on the top of the branch as signle line where in a merge you will have at least 2 parents (commits) to the merge point.

How do merge and rebase work?

3 Answers3