You cannot really properly explain Git using that “diff” and “patch” semantic. The reason for that is that Git does not track changes; it tracks content. When you have a commit A
with a parent commit B
, then for Git, A
is not the the difference between B
and A
, i.e. the changes necessary to get from B
to A
, but the actual content of A
. You could take the commit on its own, and you have everything to reconstruct the repository at that point.
For that reason, I’ll not follow your “commit arithmetic” but try to explain every case in words instead.
C5 is a merge commit with the two parents C3 and C4 which both have C2 as the parent. So assuming that there are no conflicts, and Git is able to resolve the whole merge on its own, then C5 will contain the content expressed by the following two equivalent expressions:
- C3 with the patch applied that would get you from C2 to C4.
- C4 with the patch applied that would get you from C2 to C3.
So in a way, you could formulate it with your formula.
A rebase will essentially put you in exactly the same situation. The only difference is that it actually rewrites the commits it rebases, reapplying their changes (the patch) to another commit. So the result C4
will have the same content as C5
from (1).
With your “arithmetic”, I would generally say that (C3-C2) + C2 = C3
. So both formulas are equivalent here too.
As mentioned above, a rebase is just reapplying a commit patch onto another parent. Git is writing new commits that contain the same change as the original one, but those new commits are applied onto a different parent.
So if you wanted to get formulas for that, it would probably look like this:
W' = E + (W - B) (the patch from B to W, applied on E)
X' = W' + (X - W) (the patch from W to X, applied on W')
Y' = X' + (Y - X) (the patch from X to Y, applied on X')
Z' = Y' + (Z - Y) (the patch from Y to Z, applied on Y')
So the base of W
changes to be E
instead of B
, and all follow-up commits are just updated to follow on that new commit W'
instead.
But again, the result Z'
has the same content as a merge commit that merged E
and Z
would have.
That all leaves us with a question: What’s the difference between a merge and a rebase if both result in the same content? Since you don’t lose a commit in both ways (rebase will create new commit objects, but those will retain all the original information), basically it’s all about changing how the history looks like:
Merging creates a merge commit that allows to clearly see where the history diverged (where a branch was created) and where it was united again. That’s nice since you can follow the exact development. But it also can become messy, especially if end up merging multiple times of if you have multiple concurrent branch lines.
Rebasing on the other hand makes the history flat. It “fakes” it so that everything is a linear development, that everything came properly after another. Since you are always rebasing a complete branch, you keep related commits together, so you can still see what belongs to each other, but you lose any branching information. Also, you are creating new commit objects which will break everyone’s repository who already knows about those commits objects (which is why you should never rebase commits that have been published).
There are benefits and disadvantages to both ways. It depends a lot on the repository workflow and personal preference.