I'm trying to better understand the magic behind git-rebase. I was very pleasantly surprised today by the following behavior, which I didn't expect.
TLDR: I rebased a shared branch, causing all commit sha1s to change. Despite this, a derived branch was able to accurately identify that its original commits were "aliased" into new commits with different sha1s. The rebase didn't create any mess at all.
Details
Take a master branch: M1
Branch it off into branch-X, with some additional commits added: M1-A1-B1-C1
.
Note down the git-log output.
Branch off branch-X into branch-Y, with one additional commit added: M1-A1-B1-C1-D1
. Note down the git-log output.
Add a new commit to the tip of the master branch: M1-M2
Rebase branch-X onto the updated master: M1-M2-A2-B2-C2
. Note that A2-B2-C2, all have the same message, contents and author-date as A1-B1-C1. However, they have completely different sha1 values, as well as commit dates. According to this writeup, the reason the SHA1 is different is because the commit's parent has changed.
Rebase branch-Y onto the updated branch-X. Result: M1-M2-A2-B2-C2-D2
.
Notably only the D1 commit is applied (and becomes D2). The A1-B1-C1 commits in branch-Y are completely ignored by git-rebase. You can see this in the output logs.
This is wonderful, but how does git-rebase know to ignore A1-B1-C1? How does git-rebase know that A2-B2-C2 are the same as A1-B1-C1, and hence, can be safely ignored? I had always assumed that git keeps track of commits using the sha1 identifier, but despite the above commits having different sha1s, git still somehow knows that they are linked together. How does it do that? Given the above behavior, when is it truly dangerous to rebase a shared branch?