1

I create a branch called base in which I create a file base.md:



b

There are two empty lines then followed by b character in the third line. If I edit the first line and add a, the git diff and the edited file shows:

diff --git a/base.md b/base.md
index f547db6..1e907b4 100644
--- a/base.md
+++ b/base.md
@@ -1,3 +1,3 @@
-
+a
 
 b
\ No newline at end of file
(END)
a

b

I know git says it deletes the old first line and adds a new first line with an a in there.

If I just add a new empty line in the end of the file:

diff --git a/base.md b/base.md
index f547db6..614705a 100644
--- a/base.md
+++ b/base.md
@@ -1,3 +1,3 @@
 
 
-b
\ No newline at end of file
+b
(END)


b

In this case, I know git says it deletes the old third line and adds a new third line with the b(to be exact the b\n) in there.

But if I edit the first line and add the new line together, the git diff shows:

diff --git a/base.md b/base.md
index f547db6..a1a53b5 100644
--- a/base.md
+++ b/base.md
@@ -1,3 +1,3 @@
+a
 
-
-b
\ No newline at end of file
+b
(END)

It seems not just the simple addition of the above two operations. In my words, the git seems to say: First I'll create a new line with an a for you(not just delete the old and then add the new), then I'll delete the old third line and add a new third line with the b\n. And finally, cause there are two empty lines between the a and b, I'll delete one for you.

I couldn't find some information about the git diff or it's related algorithms. Could some explain this to me? Thanks in advance.

tianzhich
  • 31
  • 3
  • The algorithm may not be consistent in all cases. For consecutive identical lines, it makes no difference to say "delete the first of them" or "delete the last of them". – iBug Sep 15 '21 at 04:42
  • Does this answer your question? [How does git diff tell if a line has been modified or added?](https://stackoverflow.com/questions/59613900/how-does-git-diff-tell-if-a-line-has-been-modified-or-added) – Joe Sep 15 '21 at 05:04
  • @Joe It answered part of my questions about the diff algorithms. But I'm still confused about the final different series of instructions that the algorithms produce. – tianzhich Sep 15 '21 at 07:10

1 Answers1

1

A line, in Git, is a sequence of characters that ends with and includes the line terminator character, \n (newline).

A completely empty file, zero bytes long, has no lines in it.

Any other file has at least one line, but the last line has the potential to be weird, because it might end with a newline—the way a line should—or it might not end with a newline. For instance, a five-byte-long file could consist of the bytes for a, \n, b, \n, and c and then simply stop. In this case the last line in the file is an incomplete line: one that does not end with a newline. All previous lines, however, are by definition complete, because their very existence as a line is determined by the fact that they ended with a newline.

Now, given two arbitrary input files—one of which we put on the "left side" or call a/file and the other of which we put on the right side or call b/filegit diff:

  1. breaks each one up into a series of lines, including their final newline terminators;
  2. finds lines that do match, and lines that don't match; and
  3. uses a fancy algorithm1 to compute a reasonably minimal set of changes that, applied to the left-side file, produce the right-side file.

At most one line in each file can be one of these weird "does not end with a newline" "lines"; all other lines do end with newline, by definition. Furthermore, in step 2, two lines match if and only if their non-newline parts match and either they both end with newline, or neither ends with newline.

In your case, you created a left-side file with one line that does not end with the appropriate final newline, and sometimes you have a right-side file that also does not end with a newline as well, but sometimes you have one that does. When you have two files, neither of which end with a newline, and the non-newline funny final lines match, git diff does not need to say anything about changing this line. But if the final line in the left side base.md file does not end with newline, and the final line in the right side base.md file does end with newline, these two lines don't match, so:

diff --git a/base.md b/base.md
index f547db6..a1a53b5 100644
--- a/base.md
+++ b/base.md
@@ -1,3 +1,3 @@
+a
 
-
-b
\ No newline at end of file
+b
(END)

the instructions for changing the left-side file read:

  • add a line a
  • keep one blank line (line consisting only of a newline) as-is
  • remove one blank line
  • remove a final line consisting of b that is weird because it has no final newline, as denoted by the extra \ No newline at end of file comment;
  • and last, add a normal final line that consists of the line—ended with newline—b.

These instructions achieve the result of turning the left-side file into the right-side file.

Note that any set of instructions that produce the correct final file are acceptable here. For human-usage reasons, we like the instructions to be as short as we can reasonably get, and as direct and close to what the human actually did as we can reasonably get, but they don't have to match exactly.


1The default one is called myers, named after Eugene Myers. See also Myers diff algorithm vs Hunt–McIlroy algorithm.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Very informative answer! Thank you torek. You said _we like the instructions as direct and close to what the human actually did_. But why didn't git generate the instructions like **remove one blank line -> add a line `a` -> keep one blank line as-is -> remove the weird line `b` -> and last add a normal final line `b`**? In my opinion, I think this one is more close to what I did, and it as short as the previous one. – tianzhich Sep 15 '21 at 07:04
  • This is where Git has some tunables: the Myers algorithm can be run in various directions and when you do that, the blank lines that match get chosen sort of randomly. Git will, after-the-fact, sometimes try to move matched sections up or down so that blank lines match, instead of `}` lines. In your particular case, this could make things better, or worse, depending on what you wanted. The `--indent-heuristic` is the one that controls whether Git does this shuffling of blank line matching. It's not perfect, and defaults to enabled, so try disabling it. – torek Sep 15 '21 at 07:07
  • I don't understand *git will sometimes try to move matched sections up or down so that blank lines match, instead of } lines*. Could you give some examples? By the way, I disabling this by using `--no-indent-heuristic`, but the instructions set were not changed. – tianzhich Sep 15 '21 at 07:31
  • That's more of a general comment, really: Git synchronizes on lines that humans find irritating in some way, so the indent-heuristic code (which is still a bit experimental) is an attempt to make its instructions match human expectations better. In your case, it doesn't seem to make any difference. – torek Sep 15 '21 at 07:40