1

I'm still learning git.

I have a file called names.txt. With this text.

enter image description here

This is my commit history

enter image description here

The first commit added the file. The second commit added the first line Mary. The third commit added the second line John.

git show 7bdb5ef

enter image description here

git show 80384aa

enter image description here

I want to rebase this and edit the commit Mary to change the text to Mary Shelly.

I do git rebase -i 4a5244b

Next I set commit Mary to edit and run the rebase.

enter image description here

Rebase stops here.

enter image description here

Now name.txt has the value at Mary commit. enter image description here

I change it to Mary Shelly and stage it.

I run

git commit --amend 

followed by

git rebase --continue

Now I get this merge conflict.

enter image description here

enter image description here

I don't understand why this happens. Commit John only changes the second line in the file. When we edit the commit Mary we only change the first line of the file. How does this cause a conflict?

enzio902
  • 457
  • 6
  • 14
  • I believe it has to do with blobs not being a 1:1 mapping to text file lines, but I don't understand well enough to explain it as an answer. Read https://git-scm.com/book/en/v2/Git-Internals-Git-Objects – mbb Feb 22 '19 at 15:19
  • 1
    Remember that while a commit is *displayed* as a diff from its parents, it's really a full *snapshot* of the code at the time of the commit. Your third commit doesn't simply add "John" to whatever was there before, it is *specifically* the two lines "Mary" and "John". Now you are rebasing that on top of "Mary Shelly", and Git can't determine if it should change "Mary Shelly" back to "Mary", or keep "Mary Shelly". – chepner Feb 22 '19 at 15:21
  • 1
    @chepner how I learned is that git remembers lines of code. And the commits Mary and John modify two different lines of code. When you `revert` a commit, you are said to revert the changes made in that particular commit. If a commit is just a full snapshot of the code how do we even know what changes are recorded in which commits? Is this referenced in any documentation I can study? – enzio902 Feb 22 '19 at 18:15
  • It's a result of the three-way merge algorithm (see https://en.wikipedia.org/wiki/Merge_(version_control) for more details). – chepner Feb 22 '19 at 18:35
  • (Indeed, `revert` has to first compute a diff between the commit to be reverted and its parent, then apply that diff to the currently checked-out commit in order to create the new commit.) – chepner Feb 22 '19 at 18:38
  • (Packfiles, which Git will use to save space by storing just deltas between versions, complicate matters a little. See https://git-scm.com/book/en/v2/Git-Internals-Packfiles) – chepner Feb 22 '19 at 18:43

3 Answers3

6

The problem is that there is a merge conflict, and chepner's comment is the key to understanding why. Well, that, and the commit graph, plus the fact that git rebase consists of repeated git cherry-pick operations. Interactive rebase allows you to add your own commands between each git cherry-pick, or even change the cherry-picks to something else. (The initial command-sheet starts out as all-pick commands, each of which means do a cherry-pick.)

Your commit history is a summary of your commit graph—essentially, the result of visiting each commit in the commit graph, starting at some particular ending point (the tip of your current branch) and working backwards. If you use git log --graph you get some potentially-important information that is left out without --graph, although in this particular case, it's easy to see that the graph is linear. So you just have three commits:

A <-B <-C   <-- master (HEAD)

where A is actually 4a5244b, B stands for 7bdb5ef, and C stands for 80384aa (if I've transcribed the images correctly). Each commit has a full, complete copy of the file names.txt. The copy is of course different in commits A, B, and C, in that in A, it's empty; in B, it is one line reading Mary; and in C, it is two lines reading Mary and then John

The graph itself arises from the fact that commit C, or 80384aa, contains the hash ID of commit B, or 7bdb5ef, inside C itself. That's why I drew an arrow coming out of C pointing to B. Git calls this C's parent commit. Git records C's hash ID in the name master, and then attaches the special name HEAD to the name master, so that it knows that this is where git log should start, and that commit C is the one you have out, for working-on, right now.

When you run git rebase -i 4a5244b—choosing commit A as the new base—Git figures out that this means copy commits B and C, so it puts their hash IDs into the list of pick commands. It then opens your editor on the command-sheet. You change pick to edit, which tells Git: Do the cherry-pick, then exit the rebase, in the middle of the operation.

You didn't force rebase to make a true copy. (To do that, use -f or --no-ff or --force-rebase—all mean the same thing. It doesn't really matter here, nor in most cases.) So Git saw that there was an instruction, Copy B so that it comes after A, and realized: Hey, wait, B is already after A. I'll just leave it there. Git did that and stopped, leaving you in this state:

A--B   <-- HEAD
    \
     C   <-- master

Note that HEAD is no longer attached to master: it now points directly to commit B. Commit C remains, and master still points to it, but Git has stopped in "detached HEAD" mode to allow you to do your edit.

You make your change to the file, git add, and git commit --amend. This makes a new commit—we could call it B' or D, and usually I use B' since usually it's a whole lot like B, but this time it's different enough, so let's use D. The new commit has A as its parent—that's what --amend does. Git updates HEAD to point to the new commit. Existing commit B remains intact. So now you have:

  D   <-- HEAD
 /
A--B
    \
     C   <-- master

The file names.txt in D has the new single line reading Mary Shelly.

You now run git rebase --continue, so Git continues with what's left in the instruction sheet. That consists of pick <hash-of-C>, which makes Git run git cherry-pick to copy C. This copy needs to go after the current commit, D. Existing commit C doesn't, so Git has to really do the job this time.

A cherry-pick is a merge—merge as a verb, at least

To perform a merge operation—to merge, the action—Git needs three inputs. These three inputs are the merge base commit, the current or --ours commit (also sometimes called local, particularly by git mergetool), and the other or --theirs commit (sometimes called remote). For regular merges, the base is often a bit distant: it's where the two lines of commits diverged. For cherry-pick—and for revert, for that matter—the base is right next to the commit. The merge base of this operation is C's parent commit B!

The actual operation of merge consists of running two git diff commands on the entire commits:

  • git diff --find-renames hash-of-base hash-of-ours: what did we change?
  • git diff --find-renames hash-of-base hash-of-theirs: what did they change?

So Git now diffs commit B, the base, vs commit D, your current/ours commit. That diff affects file names.txt and says: change the one line that says Mary to two lines: one reading Mary Shelly, and one reading John. Then Git diffs B vs C, to see what "they" (you, earlier) did. The diff affects file names.txt and says: add the line reading John at the end of the file, after the line reading Mary.

That's what Git shows you in the merge-conflict section: one file says replace Mary with Mary Shelly, the other says keep Mary and add John. If you like, you can tell Git to keep, in the merge-conflict section, more information. To do this, set diff.conflictStyle to diff3. (The default, if it's not set, is merge.)

With the diff3 setting, you'll see that the base content—marked by |||||||—is the one line Mary, and that the two files from the conflicting commits have replaced that base with, respectively, either Mary Shelly or Mary + new line John. I find this kind of merge conflict clearer and easier to merge manually.

In any case, your job at this point is to come up with the correct result—whatever that is—and write that out and copy it into index slot zero. Typically you'll just edit the messy names.txt that Git left in your work-tree, put the right contents into it, and then run git add names.txt.

Resuming

Having fixed the conflict, run git whatever --continue to resume whatever operation stopped—in this case, rebase, but this happens with cherry-pick and merge as well. Git will use the index contents, which you updated with git add, to make the new commit that's a copy of C:

  D--C'   <-- HEAD
 /
A--B
    \
     C   <-- master

Having reached the end of the command sheet, git rebase now finishes up by yanking the name master off commit C and pasting it onto C', which is the last copy it made, and then re-attaching HEAD:

  D--C'   <-- master (HEAD)
 /
A--B
    \
     C   [abandoned]
torek
  • 448,244
  • 59
  • 642
  • 775
  • Thank you for the detailed answer :) – enzio902 Feb 23 '19 at 12:27
  • I had a conflict when squashing a linear sequence of git commits with interactive rebase. Now it is clear to me why that could happen – I've used `git rebase -i HEAD~25` which included the commits I wanted to squash, but also others which I ignored and I thought git would ignore too … and these were branching. – Simon A. Eugster May 24 '22 at 12:39
  • 1
    @SimonA.Eugster: note that modern `git rebase -i` now understands, and can re-perform, merges when using `--rebase-merges`. This should not be run without first understanding exactly what it does, though (well, we can say that about much of Git which is sometimes especially unfortunate for newbies...). – torek May 25 '22 at 03:13
  • @torek Finally gave it a try today, looks very useful! – Simon A. Eugster Sep 28 '22 at 14:08
0

File-level merge operations (i.e. operations during which Git needs to reconcile two sets of changes to a file) try to allow you to move code around without causing too many conflicts, so in order to try and find the right place to apply a change, the context - the set of surrounding lines - is factored in, too.

Here, re-applying the commit John causes trouble: the original commit added John next to a line Mary. Now Git is trying to re-apply the commit but that reference line saying Mary no longer exists - all that's there is a line saying Mary Shelly... keep in mind that Git doesn't understand the purpose and/or meaning of your file, so in a case like this it won't take any chances and present this to you as a conflict so you can check it over.

Try the same thing again with a lot of other lines between John and Mary which you'll be keeping the same - you'll see that you won't get a conflict.

Jan Krüger
  • 17,870
  • 3
  • 59
  • 51
0

The problem is that the change you made to the original line is in practice too likely to also be needed on the added adjacent line to allow the merge to succeed without getting human judgement involved. The example I use is

<<<<<<<<<<<<<
    if ( g->tag == mark 
      || g->tag == error ) {
||||||||||||||
    if ( tag == mark
      || tag == error ) {
==============
    if ( tag == mark 
      || tag == release
      || tag == error ) {
>>>>>>>>>>>>>>

where one change added g-> to a pair of lines and another change added the release line in the middle.

jthill
  • 55,082
  • 5
  • 77
  • 137