Git conflicts in rebase vs merge

Question

Is there any difference between the number of conflicts when doing merge to a branch as opposed to rebase a branch? why is that?
When doing a merge the merging changes are stored in the merge commit itself (the commit with the two parents). But when doing a rebase, where is the merge being stored?

Thanks, Omer

score 7 · Answer 1 · edited Oct 18 '21 at 03:20

After looking at torek's answer, and then also re-reading the question, I'm updating to clarify a couple points...

Is there any difference between the number of conflicts when doing merge to a branch as opposed to rebase a branch? why is that?

Potentially, yes, for many reasons. The simplest is that the process of merging looks at only three commits - "ours", "theirs", and the merge base. All intermediate states are ignored. By contrast, in a rebase each commit is converted into a patch and applied separately, one at a time. So if the 3rd commit creates a conflict but the 4th commit undoes it, then rebase will see the conflict while merge will not.

Another difference is if commits have been cherry-picked or otherwise duplicated on both sides of the merge. In this case, rebase will generally skip over them, while they might cause conflicts in a merge.

There are other reasons; ultimately they're just different processes, even though they're expected to usually produce the same combined content.

When doing a merge the merging changes are stored in the merge commit itself (the commit with the two parents). But when doing a rebase, where is the merge being stored?

The results of the merge are stored in the new commits that rebase creates. By default rebase writes one new commit for every commit being rebased.

As torek explains in his answer, the question may indicate a misconception about what is stored in a merge. The question could be read to assert that the sets of changes ("patches") that led to the merged result are explicitly stored in a merge; they are not. The merge - like any commit - is a snapshot of the content. Using its parent pointers, you could figure out the patches that were applied. In the case of a rebase, git does not explicitly retain anything about the original branch point, about which commits were on which branch, or about where they were reintegrated; so each commit's changs are preserved in that commit's relationship to its parent, but there's no general way after a rebase to reconstruct the two patches that would be associated with the corresponding merge unless you have additional knowledge beyond what's stored in the repo.

So for example, suppose you have

O -- A -- B -- C <--(master)
 \
  D -- ~D -- E -- B' -- F <--(feature)

where D conflicts with changes in master, ~D reverts D, and B' is the result of cherry-picking B into feature.

Now if you merge feature into master, the merge looks only at (1) how F differs from O, and (2) how C differs from O. It doens't "see" the conflict from D, because ~D reversed the conflicting change. It will see that B and B' both changed the same lines; it might be able to auto-resolve that, since both sides made the same change, but depending what happened in other commits there's potential for a conflict here.

But once any conflicts are resolved, you end up with

O -- A -- B -- C -------- M <--(master)
 \                       /
  D -- ~D -- E -- B' -- F <--(feature)

and, as you point out, M contains the result of the merge.

Returning to the original picture...

O -- A -- B -- C <--(master)
 \
  D -- ~D -- E -- B' -- F <--(feature)

...if you instead rebase feature onto master, it's almost like progressively merging each feature commit with master one at a time. You can roughly imagine that you started by saying

git checkout master
git merge feature~4

which creates a conflict. You resolve that, and get

O -- A -- B -- C -- M <--(master)
 \                /
  -------------- D -- ~D -- E -- B' -- F <--(feature)

You could then proceed to the next commit with

git merge feature~3

that may or may not conflict, but when you're done you'd get

O -- A -- B -- C -- M -- M2 <--(master)
 \                /     /
  -------------- D -- ~D -- E -- B' -- F <--(feature)

and, if you resolved any conflicts correctly, M2 should have the same content as C. Then you do E.

git merge feature~2

B' is a little different, because rebase would skip it; so you could do

git merge -s ours feature~1

and finally

git merge feature

You would end up with

O -- A -- B -- C -- M -- M2 -- M3 -- M4 - M5<--(master)
 \                /     /    /     /    /
  -------------- D -- ~D -- E -- B' -- F <--(feature)

(where M4 was an "ours" merge, so M4 has the same content as M3).

So a rebase is a lot like that, except it doesn't track the "2nd parent" pointers that link the new commits back to the feature branch, and it completely skips B'. (Also it moves the branches differently.) So instead we draw

                   D' -- ~D' -- E' -- F' <--(feature)
                 /
O -- A -- B -- C <--(master)
 \
  D -- ~D -- E -- B' -- F

so we can visually indicate that D' "came from" D even though it isn't a merge commit with a parent pointer showing its relationship to D. Still, that's where the result of merging those changes is stored; and ultimately F' stores the completed integration of the two histories.

As mentioned above, nothing in the final state of the repo (post-rebase) makes it clear what patches would have been associated with the (roughly equivalent) merge. You could git diff O C to see one of them, and git diff C F' to see the other, but you need info that git doesn't retain in order to know that O, C, and F' are the relevant commits.

Note that F is, in this picture, unreachable. It still exists, and you could find it in the reflog, but unless something else points to it, gc could eventually destroy it.

Also note that rebasing feature to master doesn't advance master. You could

git checkout master
git merge feature

to ff master onto feature to complete the integration of the branches.

Only thing I would suggest is that "Note that F is, in this picture, unreachable" be clarified to say that the whole chain from F backwards (until we come to O) is unreachable. Basically we've (deliberately) lost the whole "branch", replacing it with a "copy". Beginners often don't grasp that about rebases. — matt, Feb 08 '21 at 20:06

matt · Accepted Answer · 2021-02-07T22:03:12.603

A rebase is (mostly) just a series of cherry-picks. Both a cherry-pick and a merge use the same logic — what I call "merge logic", and what the docs usually call a "3-way merge" — to create a new commit.

That logic is, given commits X and Y:

Start with an earlier commit. This is called the merge base.
Make a diff between the earlier commit and X.
Make a diff between the earlier commit and Y.
Apply both diffs to the earlier commit, and:

a. If you can do that, make a new commit expressing the result.

b. If you can't do it, complain that you've got a conflict.

In this respect, merge and cherry-pick (and therefore merge and rebase) are almost the same thing, but there are some differences. One extremely important difference in particular is who the "3" are in the logic of the "3-way merge". In particular, they can have different ideas about who the "earlier commit" is in the first step (the merge base).

Let's take first a degenerate example where merge and cherry-pick are almost identical:

A -- B -- C <-- master
      \
       F <-- feature

If you merge feature into master, Git looks for the commit where feature and master most recently diverged. That is B. It is the "earlier commit" in our merge logic — the merge base. So Git diffs C with B, and diffs F with B, and applies both diffs to B to form a new commit. It gives that commit two parents, C and F, and moves the master pointer:

A -- B - C - Z <-- master
      \     /
       \   / 
         F <-- feature

If you cherry-pick feature onto master, Git looks for the parent of feature, meaning the parent of F. That is B again! (That's because I deliberately chose this degenerate case.) That is the "earlier commit" in our merge logic. So once again Git diffs C with B, and diffs F with B, and applies both diffs to B to form a new commit. Now it gives that commit one parent, C, and moves the master pointer:

A -- B - C - F' <-- master
      \   
       F <-- feature

If you rebase feature onto master, git does a cherry-pick of each commit on feature and moves the feature pointer. In our degenerate case there is just one commit on feature:

A -- B - C <-- master
      \    \
       \    F' <-- feature
        F

Now, in those diagrams, it happens that the "earlier commit" that serves as the merge base is the same in every case: B. So the merge logic is the same, so the possibility of a conflict is the same, in every diagram.

But if I introduce more commits on feature, things change:

A -- B -- C <-- master
      \
       F -- G <-- feature

Now, to rebase feature onto master means to cherry-pick F onto C (giving F') and then to cherry-pick G onto that (giving G'). For that second cherry-pick, Git uses F as the "earlier commit" (the merge base), because it is the parent of G. This introduces a situation we have not considered before. In particular, the merge logic is going to involve a diff from F to F', along with a diff from F to G.

So when we rebase, we iteratively cherry-pick each commit along the rebased branch, and on each iteration the three commits being compared in our merge logic are different. So clearly we introduce new possibilities for a merge conflict, because, in effect, we are doing many more distinct merges.

torek · Answer 3 · 2021-02-07T20:40:43.963

Is there any difference between the number of conflicts when doing merge to a branch as opposed to rebase a branch? why is that?

The verb is is, I think, overreach here. If we change that to can there be, the answer is definitely yes. The reason is straightforward: rebase and merge are fundamentally different operations.

When doing a merge the merging changes are stored in the merge commit itself (the commit with the two parents). But when doing a rebase, where is the merge being stored?

This question presupposes something that's not the case, though it's minor in some aspects. To explain what's going on, though, it's no longer minor.

Specifically, to understand all of this, we need to know:

what commits are, exactly (or at least in pretty good detail);
how branch names work;
how merge works, reasonably-exactly; and
how rebase works, reasonably-exactly.

Any small errors in each of these get magnified when we combine them, so we need to be pretty detailed. It will help to break rebase down a bit, as rebase is essentially a series of repeated cherry-pick operations, with a bit of surrounding stuff. So we'll add "how cherry-pick works" to the above.

Commits are numbered

Let's start with this: Each commit is numbered. The number on a commit is not a simple counting number, though: we don't have commit #1, followed by #2, then #3, and so on. Instead, each commit gets a unique but random-looking hash ID. This is a very big number (currently 160 bits long) represented in hexadecimal. Git forms each number by doing a cryptographic checksum over the contents of each commit.

This is the key to making Git work as a Distributed Version Control System (DVCS): a centralized VCS like Subversion can give every revision a simple counting number, because there is in fact a central authority that hands out these numbers. If you can't reach the central authority at the moment, you cannot make a new commit either. So in SVN, you can only commit when the central server is available. In Git, you can commit locally, any time: there is no designated central server (though of course you can pick any Git server and call it "the central server" if you like).

This matters most when we connect two Gits to each other. They will use the same number for any commit that is bit-for-bit identical, and a different number for any commit that isn't. That's how they can figure out whether they have the same commits; that's how the sending Git can send to the receiving Git, any commits that the sender and receiver agree that the receiver needs and the sender wants the receiver to have, while still minimizing data transfer. (There's more to it than just this, but the numbering scheme is at the heart of it.)

Now that we know that commits are numbered—and, based on the numbering system, that no part of any commit can change either, once it's made, since this just results in a new and different commit with a different number—we can look at what's actually in each commit.

Commits store snapshots and metadata

Each commit has two parts:

A commit has a full snapshot of every file that Git knew about, at the time you, or whoever, made that commit. The files in the snapshot are stored in a special, read-only, Git-only, compressed and de-duplicated format. The de-duplication means that there's no penalty if there are thousands of commits that all have the same copy of some file: those commits all share that file. Since most new commits one makes mostly have the same versions of the same files as some or most earlier commits, the repository doesn't really grow much at all, even though every commit has every file.
Apart from the files, each commit stores some metadata, or information about the commit itself. This includes things like the author of the commit and some date-and-time-stamps. It includes a log message, where you get to explain to yourself and/or others why you made this particular commit. And—key to Git's operation, but not something you manage yourself—each commit stores the commit number, or hash ID, of some previous commit or commits.

Most commits store just one previous commit. The goal with this previous commit hash ID is to list the parent or parents of the new commit. This is how Git can figure out what changed, even though each commit has a snapshot. By looking up the previous commit, Git can obtain the previous commit's snapshot. Git can then compare the two snapshots. The de-duplication makes this even easier than it would be otherwise. Any time the two snapshots have the same file, Git can just say nothing at all about this. Git only has to compare files when they are actually different in the two files. Git uses a difference engine to figure out what changes will take the older (or left-hand-side) file and convert it to the newer (right-hand-side) file, and shows you those differences.

You can use this same difference engine to compare any two commits or files: just give it a left and right side file to compare, or a left and right side commit. Git will play the Spot the Difference game and tell you what changed. This will matter for us later. For now, though, just comparing parent and child, for any simple one-parent-one-child commit pair, will tell us what changed in that commit.

For simple commits with one child pointing backwards to one parent, we can draw this relationship. If we use single uppercase letters to stand in for hash IDs—because real hash IDs are too big and ugly for humans to work with—we get a picture that looks like this:

... <-F <-G <-H

Here, H stands in for the last commit in the chain. It points backwards to earlier commit G. Both commits have snapshots and parent hash IDs. So commit G points backwards to its parent F. Commit F has a snapshot and metadata, and therefore points backwards to yet another commit.

If we have Git start at the end, and just go backwards one commit at a time, we can get Git to go all the way back to the very first commit. That first commit won't have a backwards-pointing arrow coming out of it, because it can't, and that will let Git (and us) stop and rest. That's what git log does, for instance (at least for the simplest case of git log).

We do, however, need a way to find the last commit. This is where branch names come in.

A branch name points to a commit

A Git branch name holds the hash ID of one commit. By definition, whatever hash ID is stored in that branch name, is the end of the chain for that branch. The chain might keep going, but since Git works backwards, that's the end of that branch.

This means that if we have a repository with only one branch—let's call it main, as GitHub do now—there's some last commit and its hash ID is in the name main. Let's draw that:

...--F--G--H   <-- main

I've gotten lazy and stopped drawing the arrows from commits as arrows. This is also because we're about to have an arrow-drawing problem (at least on StackOverflow where the fonts are potentially limited). Note that this is the same picture we had a moment ago; we've just figured out how we remember the hash ID of commit H: by sticking it into a branch name.

Let's add a new branch. A branch name has to hold the hash ID of some commit. Which commit should we use? Let's use H: it's the commit we're using now, and it's the latest, so it makes a lot of sense here. Let's draw the result:

...--F--G--H   <-- dev, main

Both branch names pick H as their "last" commit. So all commits up through and including H are on both branches. We need one more thing: a way to remember which name we're using. Let's add the special name HEAD, and write it in after one branch name, in parentheses, to remember which name we're using:

...--F--G--H   <-- dev, main (HEAD)

This means we're on branch main, as git status would say. Let's run git checkout dev or git switch dev and update our drawing:

...--F--G--H   <-- dev (HEAD), main

We can see that HEAD is now attached to the name dev, but we're still using commit H.

Let's make a new commit now. We'll use the usual procedures (without describing them here). When we run git commit, Git will make a new snapshot and add new metadata. We might have to enter a commit message first, to go into the metadata, but one way or another we'll get there. Git will write all of this out to make a new commit, which will get a new, unique, big ugly hash ID. We'll just call this commit I instead though. Commit I will point back to H, because we were using H up until this moment. Let's draw in the commit:

             I
            /
...--F--G--H

But what about our branch names? Well, we didn't do anything to main. We added a new commit, and this new commit should be the last commit on branch dev. To make that happen, Git simply writes I's hash ID into the name dev, which Git knows is the right name, because that's the name HEAD is attached to:

             I   <-- dev (HEAD)
            /
...--F--G--H   <-- main

and we have exactly what we want: the last commit on main is still H but the last commit on dev is now I. Commits up through H are still on both branches; commit I is only on dev.

We can add more branch names, pointing to any of these commits. Or, we can now run git checkout main or git switch main. If we do that, we get:

             I   <-- dev
            /
...--F--G--H   <-- main (HEAD)

Our current commit is now commit H, because our current name is main, and main points to H. Git takes all the commit-I files out of our working tree and puts into our working tree all the commit-H files instead.

(Side note: note that the working tree files are not in Git themselves. Git just copies the Git-ified, committed files from the commits, to our working tree, here. That's part of the action of a checkout or switch: we pick some commit, usually through some branch name, and have Git erase the files from the commit we were working with, and put in the chosen commit's files instead. There's a lot of fancy mechanism hidden inside this, but we'll ignore all of that here.)

We're now ready to go on to git merge. It's important to note that git merge does not always do any actual merging. The description below will start with a setup that requires a real merge, and therefore, running git merge will do a true merge. A true merge can have merge conflicts. The other things that git merge does—the so-called fast-forward merge, which isn't really a merge at all, and the cases where it just says no and doesn't do anything—can't actually have merge conflicts.

How a true merge works

Let's say that at this point, in our Git repository, we have these two branches arranged like this:

          I--J   <-- branch1 (HEAD)
         /
...--G--H
         \
          K--L   <-- branch2

(There might be a branch name pointing to H, or some other commit, but we won't bother drawing it in as it doesn't matter for our merging process.) We're "on" branch1, as you can see from the drawing, so we have commit L checked out right now. We run:

git merge branch2

Git will now locate commit J, which is trivial: that's the one we're sitting on. Git will also locate commit L, using the name branch2. That's easy because the name branch2 has the raw hash ID of commit L in it. But now git merge does the first of its main tricks.

Remember, the goal of a merge is to combine changes. Commits J and L don't have changes though. They have snapshots. The only way to get changes from some snapshot is to find some other commit and compare.

Directly comparing J and L might do something, but it doesn't do much good in terms of actually combining two different sets of work. So that's not what git merge does. Instead, it uses the commit graph—the things we've been drawing with the uppercase letters standing in for commits—to find the best shared commit that's on both branches.

This best shared commit is actually the result of an algorithm called the Lowest Common Ancestors of a Directed Acyclic Graph, but for a simple case like this one, it's pretty obvious. Start at both branch tip commits J and L, and use your eyeball to work backwards (leftwards). Where do the two branches come together? That's right, it's at commit H. Commit G is shared too, but H comes closer to the ends than G, so it's obviously (?) better. So it's the one that Git picks here.

Git calls this shared starting point the merge base. Git can now do a diff, from commit H to commit J, to figure out what we changed. This diff will show come change(s) to some file(s). Separately, Git can now do a diff from commit H to commit L, to figure out what they changed. This diff will show some change(s) to some file(s): maybe entirely different files, or maybe, where we both changed the same files, we changed different lines of those files.

The job of git merge is now to combine the changes. By taking our changes and adding theirs—or taking theirs and adding ours, which gives the same results—and then applying the combined changes to whatever is in commit H, Git can build up a new, ready-to-go snapshot.

This process fails, with merge conflicts, when "our" and "their" changes collide. If we and they both touched the same line(s) of the same files, Git doesn't know whose change to use. We'll be forced to fix up the mess and then continue the merge.

There's a great deal to know about how this fixing-up goes and how we can automate more of it, but for this particular answer, we can stop here: we either have conflicts, and have to fix them up manually and run git merge --continue,¹ or we have no conflicts and Git will finish off the merge itself. The merge commit gets a new snapshot—not changes, but rather a full snapshot—and then links back to both commits: its first parent is our current commit as usual, and then it has, as a second parent, the commit we said to merge. So the resulting graph looks like this:

          I--J
         /    \
...--G--H      M   <-- branch1 (HEAD)
         \    /
          K--L   <-- branch2

Merge commit M has a snapshot, and if we run git diff hash-of-J hash-of-M, we'll see the changes we brought in because of "their" work in their branch: the changes from H to L that got added to our changes from H to J. If we run git diff hash-of-L hash-of-M, we'll see the changes brought in because of "our" work in our branch: the changes from H to J that got added to their changes from H to L. Of course, if the merge stops for any reason before making commit M, we can make arbitrary changes to the final snapshot for M, making what some call an "evil merge" (see Evil merges in git?).

(This merge commit is also a bit of a stumbling block for git log later, because:

There's no way to generate a single ordinary diff: which parent should it use?
There are two parents to visit, as we traverse backwards: how will it visit both? Will it visit both?

These questions and their answers are rather complex, but are not for this StackOverflow answer.)

Next, before we move on to rebase, let's look closely at git cherry-pick.

¹Instead of git merge --continue, you can run git commit. This winds up doing exactly the same thing. The merge program leaves breadcrumbs, and git commit finds them and realizes it's finishing the merge and implements git merge --continue rather than doing a simple single-parent merge. In the bad old days, when Git's user interface was much worse, there was no git merge --continue, so those of us with very old habits tend to use git commit here.

How `git cherry-pick` works

At various times, when working with any version control system, we will find some reason that we'd like to "copy" a commit, as it were. Suppose, for instance, that we have the following situation:

       H--P--C--J   <-- feature1
      /
...--G--I   <-- main
         \
          K--L--N   <-- feature2 (HEAD)

Someone is working on feature1, and has been for a bit; we're working on feature2 right now. I've named two commits on branch feature1 P and C for a reason that isn't obvious yet, but will become obvious. (I skipped M just because it sounds too much like N, and I like to use M for Merge.) As we go to make a new commit O, we realize that there's a bug, or a missing feature, that we need, that the guys doing feature1 already fixed or wrote. What they did was to make some changes between parent commit P and child commit C, and we'd like those exact same changes now, here, on feature2.

(Cherry-picking here is often the wrong way to do this, but let's illustrate it anyway, since we need to show how cherry-pick works, and doing it "right" is more complicated.)

To make a copy of commit C, we just run git cherry-pick hash-of-C, where we find the hash of commit C by running git log feature1. If all goes well, we end up with a new commit, C'—so named to indicate that it's a copy of C, sort of—that goes on the end of our current branch:

       H--P--C--J   <-- feature1
      /
...--G--I   <-- main
         \
          K--L--N--C'  <-- feature2 (HEAD)

But how does Git achieve this cherry-pick commit?

The simple—but not quite right—explanation is to say that Git compares the snapshots in P and C to see what someone changed there. Then Git does the same thing to the snapshot in N to make C'—though of course C''s parent (singular) is commit N, not commit P.

But this doesn't show how cherry-pick can have merge conflicts. The real explanation is more complicated. The way cherry-pick really works is to borrow that merge code from earlier. Instead of finding an actual merge base commit, though, cherry-pick just forces Git to use commit P as the "faked" merge base. It sets commit C to be "their" commit. That way, "their" changes will be P-vs-C. That's exactly the changes we'd like to add to our commit N.

To make those changes go in smoothly, the cherry-pick code goes on to use the merge code. It says that our changes are P vs N, because our current commit is commit N when we start the whole thing. So Git diffs P vs N to see what "we" changed in "our branch". The fact that P isn't even on our branch—it's only on feature1—is not important. Git wants to be sure that it can fit the P-vs-C changes in, so it looks at the P-vs-N difference to see where to put the P-vs-C changes in. It combines our P-vs-N changes with their P-vs-C changes, and applies the combined changes to the snapshot from commit P. So the whole thing is a merge!

When the combining goes well, Git takes the combined changes, applies them to what's in P, and gets commit C', which it makes on its own as a normal, single-parent commit with parent N. That gets us the result we wanted.

When the combining does not go well, Git leaves us with the exact same mess we'd get for any merge. The "merge base" is what is in commit P this time, though. The "ours" commit is our commit N, and the "theirs" commit is their commit C. We're now responsible for fixing up the mess. When we are done, we run:

git cherry-pick --continue

to finish off the cherry-pick.² Git then makes commit C' and we get what we wanted.

Side note: git revert and git cherry-pick share most of their code. A revert is achieved by doing the merge with parent and child swapped. That is, git revert C has Git find P and C and HEAD, but this time, does the merge with C as the base, P as "their" commit, and HEAD as our commit. If you work through a few examples, you'll see that this achieves the right result. The other tricky bit here is that an en-masse cherry-pick has to work "left to right", older commit to newer, while an en-masse revert has to work "right to left", newer commit to older. But now it's time to move on to rebase.

²As in footnote 1 for merge, we can use git commit here too, and in the bad old days there was probably a time when one had to, although I think by the time I used Git—or at least the cherry-picking feature—the thing that Git calls the sequencer was in place and git cherry-pick --continue worked.

How rebase works

The rebase command is very complicated, with a whole lot of options, and we won't cover all of it by any means here. What we'll look at is in part a recap of what Mark Adelsberger got into his answer while I was typing all of this.

Let's go back to our simple merge setup:

          I--J   <-- branch1 (HEAD)
         /
...--G--H
         \
          K--L   <-- branch2

If, instead of git merge branch2, we run git rebase branch2, Git will:

List out commits (hash IDs) that are reachable from HEAD / branch1, but not reachable from branch2. These are the commits that are only on branch1. In our case that's commits J and I.
Make sure the list is in "topological" order, i.e., I first, then J. That is, we want to work left-to-right, so that we always add later copies atop earlier copies.
Knock out of the list any commits that for some reason should not be copied. This is complicated, but let's just say that no commits get knocked out: that's a pretty common case.
Use Git's detached HEAD mode to begin cherry-picking. This amounts to running git switch --detach branch2.

We haven't mentioned detached HEAD mode yet. When in detached HEAD mode, the special name HEAD doesn't hold a branch name. Instead, it holds a commit hash ID directly. We can draw this state like this:

          I--J   <-- branch1
         /
...--G--H
         \
          K--L   <-- HEAD, branch2

Commit L is now the current commit but there is no current branch name. This is what Git means by the term "detached HEAD". In this mode, when we make new commits, HEAD will point directly to those new commits.

Next, Git will run the equivalent of git cherry-pick for each commit it still has in its list, after the knocking-out step. Here, that's the actual hash IDs of commits I and J, in that order. So we run one git cherry-pick hash-of-I first. If all works well, we get:

          I--J   <-- branch1
         /
...--G--H
         \
          K--L   <-- branch2
              \
               I'  <-- HEAD

During the copying process, the "base" here is commit H (parent of I), "their" commit is our commit I, and "our" commit is their commit L. Note how the ours and theirs notions appear swapped around at this point. If there's a merge conflict—which can happen because this is a merge—the ours commit will be theirs and the theirs commit will be ours!

If all goes well, or you have fixed any issues and used git rebase --continue to continue the merge, we now have I' and we begin copying commit J. The end goal of this copying is:

          I--J   <-- branch1
         /
...--G--H
         \
          K--L   <-- branch2
              \
               I'-J'  <-- HEAD

If something goes wrong, you'll get a merge conflict. This time the base commit will be I (which is one of ours) and the theirs commit will be J (still one of ours). The really confusing part is that the ours commit will be commit I': the one we just made, just now!

If there were more commits to copy, this process would repeat. Each copy is a potential place to experience merge conflicts. How many actual conflicts occur depends heavily on the various commits' contents, and whether you do something, during a conflict resolution of some earlier commit, that will set up a conflict when cherry-picking a later commit. (I've had situations where every single commit being copied has the same conflict, over and over again. Using git rerere is very helpful here, although a bit scary sometimes.)

Once all the copying is done, git rebase works by yanking the branch name off the commit that used to be the branch tip, and pasting it to the commit HEAD now names:

          I--J   ???
         /
...--G--H
         \
          K--L   <-- branch2
              \
               I'-J'  <-- HEAD, branch1

The old commits are now hard to find. They are still in your repository, but if you don't have another name that lets you find them, they seem to be gone! Last, just before returning control to you, git rebase re-attaches HEAD:

          I--J   ???
         /
...--G--H
         \
          K--L   <-- branch2
              \
               I'-J'  <-- branch1 (HEAD)

so that git status says on branch branch1 again. Running git log, you see commits that have the same log message as your original commits. It seems as though Git has somehow transplanted those commits. It hasn't: it has made copies. The originals are still there. The copies are the rebased commits, and make up the rebased branch, in the way humans think of branches (though Git doesn't: Git uses hash IDs, and these are clearly different).

Conclusion

The bottom line, as it were, is that git merge merges. This means: make one new commit, by combining work, and tie that one new commit back to both existing chains of commits. But git rebase copies commits. This means: make many new commits, by copying those old commits; the new commits live elsewhere in the commit graph, and have new snapshots, but re-use the old commits' author names, author date stamps, and commit messages; and once the copying is done, yank the branch name off the old commits and paste it onto the new ones, abandoning the old commits in favor of the new and improved ones.

This "abandoning" is what people mean when they say that rebase rewrites history. History, in a Git repository, is the commits in the repository. They're numbered, by hash IDs, and two Git repositories have the same history if they have the same commits. So when you copy old commits to new-and-improved ones, abandoning the old ones, you need to convince the other Git repositories to also abandon those old commits in favor of the new ones.

That—convincing other users with their Git repositories—can be easy or hard. It's easy if they all understand this in the first place and have agreed to do this in advance. Merging, on the other hand, does not throw away old history in favor of new-and-improved history: it just adds new history that refers back to old history. Git can easily add new history: that's how Git is built, after all.

Git conflicts in rebase vs merge

3 Answers3

Commits are numbered

Commits store snapshots and metadata

A branch name points to a commit

How a true merge works

How `git cherry-pick` works

How rebase works

Conclusion

Linked

Related

Git conflicts in rebase vs merge

3 Answers3

Commits are numbered

Commits store snapshots and metadata

A branch name points to a commit

How a true merge works

How git cherry-pick works

How rebase works

Conclusion

Linked

Related

How `git cherry-pick` works