OK, deep breath :-)
Git's rebase
copies commits
The fundamental trick that git rebase
uses is the cherry-pick operation, which copies a commit. We'll get to the mechanics of copying a commit at the moment, but consider a simple, ordinary git cherry-pick
, where we git checkout
some branch name—I will create a new one pointing to one particular commit here—and then tell Git to copy some other commit, typically one that is not (yet) on our new branch.
git checkout -b newbranch 23dfc61
This makes 23dfc61
the current commit but gives it a new branch name, newbranch
. Now we can make new commits, which add to the new branch, so now we run, e.g.:
git cherry-pick 9d804c2
to copy commit 9d8042c
.
The result, if this works—if there is no merge conflict, or after you clean up any merge conflict if there is one—is a new commit whose parent is 23dfc61
, and whose source tree is like 23dfc61
but with whatever you changed in 9d804c2
as compared to 6ada3b7
, added to it:
...
* 9d804c2 - (2 days ago) new.txt changed in 16:35 - stav alfi
* 6ada3b7 - (2 days ago) new.txt changed in 16:32 - stav alfi (oldDad)
* f6497fc - (2 days ago) this is the nest commit! - stav alfi (oldDad1)
* b1b3e25 - (2 days ago) omg - stav alfi
* 74656b3 - (2 days ago) new1234 - stav alfi
* e8977d3 - (2 days ago) fast commit - stav alfi
* 114b46c - (3 days ago) good - Stav Alfi
* 8212c78 - (3 days ago) good - Stav Alfi
| * NNNNNNN - (now) new.txt changed in 16:35 - stav alfi (HEAD -> newbranch)
|/
* 23dfc61 - (3 days ago) removed-something - Stav Alfi
* 184178d - (3 days ago) shortcut - Stav Alfi
...
We don't know what the new hash number will be, so I put in NNNNNNN
. But the new commit has the same log message as the old one, and makes the same change as the old one.
Commits contain snapshots, not changes
Each commit has, attached to it, the complete source as of the time of that commit. This is different from many other version control systems: most tend to store each commit as a change from the commit before them, or the commit after them. What this means here is that in order to copy a commit, Git first has to find out what changed.
The way to find out is to compare the commit to its parent commit. The parent commit of 9d804c2
is 6ada3b7
, so Git does:
git diff 6ada3b7 9d804c2
to see what changed. Assuming the log message is accurate, you changed something in new.txt
, so that's what Git will find. That, then, is also what Git will try to do when it tries to modify the snapshot saved for 23dfc61
to come up with a new snapshot for NNNNNNN
.
If that succeeds, Git will commit the result, and will have made a successful cherry-pick.
No commit can ever be changed
The unpronounceable hash IDs 23dfc61
and 6ada3b7
and badf00d
and bedface
and so on are constructed by taking the exact contents of each commit. If you try to change anything about any commit, Git builds a new commit; if there's even a single bit different anywhere, you get a new, different hash, so you get a new, different commit.
The parts that go into this include all the source, plus the parent ID, as each commit "points to" (contains the ID of) its parent. (There are also some time stamps, so unless you make the same commit twice in the same second, you still get two different IDs, even if they have the rest of their bits identical.) Hence, to change anything—whether it's the source, or just a parent ID—Git must copy commits.
This is why rebase copies commits: it must. You are taking some set of commits, turning each one into a change, and then applying those changes starting at some different commit, which has a different parent ID, even if it has the same source tree. So what you give to git rebase
is, essentially, two chunks of information:
- Which commits should it copy?
- Where should it place those copies?
The place to copy is easy if you use --onto
, as that's the place! The set of commits to copy, however, is trickier.
Selecting commits
Git provide a range notation, X..Y
, that looks like it means "commits between X and Y"—and it does, sort of. But not quite! In fact, Git uses something we call reachability, following parent links in commits. We already noted that each commit has a parent ID stored in it. That's how Git can find your commits: you tell it to start at a branch tip, using a branch name like master
, and it finds that particular commit by its hash ID, which Git remembers for you inside the name master
.
That commit has another hash ID in it: this is the commit's parent. Git uses that hash ID to find that commit. The parent has yet another hash ID, and Git keeps finding more and more parents. This goes on as long as it possibly can, all the way back to the very first commit you ever made.
That's too many commits, so we tell Git to stop going back at some point. That's the "Y" part of X..Y
: this tells Git start at Y and work backwards, marking commits "green" temporarily to take them. But, at the same time, start at X and work backwards, marking commits "red" temporary to avoid taking them.
I like to draw all of this with one-letter names for commits, instead of the big ugly hash IDs, and connecting lines that have older commits at the left and newer commits at the right:
...--D--E--F--G--H <-- branch
Here commit H
is the tip of the branch, G
is H
's parent, F
is G
's parent, and so on. If we write E..H
, that paints E
(and D
and on back) "red": stop, don't take these! Then it paints H
green, and then G
and F
, and then we hit the red-painted E
and stop. So that selects commits F-G-H
. E
is naturally excluded here.
But when we have branches and merges, things get trickier:
F--G--H
/ \
...--D--E K--L
\ /
I-----J
Commit K
is a merge commit. A merge commit is one that has two (or more, but let's not go there) parents. If we stick with the red and green paint analogy, E..L
means "paint E
and on back red and paint L
on back green": when we hit K
, we paint both H
and J
green, and work back on both sides of this branch/merge.
If we say G..L
, look how that works: we paint G
red, then F
, then E
and D
and so on. We never paint I
at all, because that's not backwards from F
: we can only move back, not forward, during this process. So then we paint L
green, and K
, and then both H
and J
. G
is already red, so we stop that side, but keep going on the other, painting I
green. Then we move back to E
, but it's red so we stop. So this selects I
and J
, and also H
, and K
and L
(in some order).
What git rebase
copies: merges are a problem
When Git goes to select commits to copy, it uses your other (not---onto
) argument as the "red paint" part of the stop item, and your current commit as the "green paint" part. If you don't use --onto
, the onto target is the same as the red-paint selector. That's all --onto
does: it lets you choose a different "stop" red-paint selector than the target.
But if there is a merge in here—and in your case, there is—we have a problem, or really, two problems. One is that rebase cannot copy a merge, so it just does not even try. It just removes merges entirely, from the set of commits to copy. The other is that we follow both legs of a branch-and-merge, but we do not get to control the order unless we use an interactive (-i
) rebase.
You were on master
and ran:
git rebase --onto newDad oldDad1
so this selects:
oldDad1..master
as the commits to copy, but throws out all the merges, and linearizes the remainder of the commits. That means you start with:
* 006f7ab - (2 days ago) Merge branch 'hotfix' idc what will heppen :( - stav alfi (master
|\
| * 0f028e8 - (2 days ago) good - stav alfi
* | fc040d3 - (2 days ago) good - stav alfi
* | ed29b30 - (2 days ago) good - stav alfi
|/
* a7c5bb3 - (2 days ago) good branch - stav alfi
* 9d804c2 - (2 days ago) new.txt changed in 16:35 - stav alfi
* 6ada3b7 - (2 days ago) new.txt changed in 16:32 - stav alfi (oldDad)
but end up with:
* 0f028e8 - (2 days ago) good - stav alfi
* fc040d3 - (2 days ago) good - stav alfi
* ed29b30 - (2 days ago) good - stav alfi
* a7c5bb3 - (2 days ago) good branch - stav alfi
* 9d804c2 - (2 days ago) new.txt changed in 16:35 - stav alfi
* 6ada3b7 - (2 days ago) new.txt changed in 16:32 - stav alfi (oldDad)
or—since we don't control the order:
* fc040d3 - (2 days ago) good - stav alfi
* ed29b30 - (2 days ago) good - stav alfi
* 0f028e8 - (2 days ago) good - stav alfi
* a7c5bb3 - (2 days ago) good branch - stav alfi
* 9d804c2 - (2 days ago) new.txt changed in 16:35 - stav alfi
* 6ada3b7 - (2 days ago) new.txt changed in 16:32 - stav alfi (oldDad)
(all I did here was swap the two legs around). Git will check out commit db309e9
(newDad, your --onto
) as a temporary branch, and then start cherry-picking each commit, turning 6ada3b7
into a change by comparing it against f6497fc
. But this immediately fails:
error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
A new.txt
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): new.txt deleted in HEAD and modified
in new.txt changed in 16:32. Version new.txt changed in 16:32
of new.txt left in tree.
The problem here is that new.txt
does not exist in commit db309e9
. Git does not know how to combine "make a slight change to new.txt
" with "don't have a new.txt
at all".
It's now your job to fix this conflict, by deciding how to have new.txt
appear in the final snapshot. Edit or remove the file in the work-tree and when you are done, git add
the result and run git rebase --continue
and Git will go on to attempt to cherry-pick the next commit.
This repeats until git rebase
has copied all the to-be-copied commits. Once that finishes, git rebase
tells Git to "peel off" the original branch label (master
) and paste it onto the last commit it just made. So now the master
branch will name the newest commit, which will point back to its parent, and so on. The original commits—the ones you copied—are still in the repository, for a while, but they are now "abandoned" from this branch: they do not have the name master
available to find them.
But existing branch names can still find the existing commits
The names oldDad
and oldDad1
still point to some of the original (not-copied) commits here. Those names will still find those original commits. If there were more names that remembered some of the copied commits, those names would still remember the originals too. So the copied commits are not only not gone, sometimes they are still visible, depending on branch names.
Note that your final merge is just gone
Because git rebase
does not even try to copy the merge, your merge commit will simply be omitted entirely. However, since both "legs" of the merge get applied (in some order), the final source tree will match, provided you resolve any conflicts appropriately. How hard or easy that will be depends on which leg gets done first and whether the two legs affect each other.
There is a --preserve-merges
flag
There is a way to get git rebase
to attempt to preserve merges. But it cannot actually preserve them. Instead, what it does is to copy each leg of a fork as before, but this time, by forking the two legs; and then when it reaches the merge commit, it runs a new git merge
to make a new merge that—Git hopes—is "just as good" as the original.
In this particular case, --preserve-merges
won't help with the immediate problem, because that happens before the branch-and-re-merge sequence. This new.txt
file that is modified in the first commit you are cherry-picking, but does not exist in your starting-point, happens well before the branch-and-merge sequence. Whether --preserve-merges
is any use to you, I do not know.