How to read the output from git diff?

Question

The man page for git-diff is rather long, and explains many cases which don't seem to be necessary for a beginner. For example:

git diff origin/master

by using a different text editor the @ ... @ range notations for line numbers became obvious. — poseid, Oct 07 '10 at 12:28

score 596 · Accepted Answer · edited Sep 28 '21 at 11:33

Lets take a look at example advanced diff from git history (in commit 1088261f in git.git repository):

diff --git a/builtin-http-fetch.c b/http-fetch.c
similarity index 95%
rename from builtin-http-fetch.c
rename to http-fetch.c
index f3e63d7..e8f44ba 100644
--- a/builtin-http-fetch.c
+++ b/http-fetch.c
@@ -1,8 +1,9 @@
 #include "cache.h"
 #include "walker.h"
 
-int cmd_http_fetch(int argc, const char **argv, const char *prefix)
+int main(int argc, const char **argv)
 {
+       const char *prefix;
        struct walker *walker;
        int commits_on_stdin = 0;
        int commits;
@@ -18,6 +19,8 @@ int cmd_http_fetch(int argc, const char **argv, const char *prefix)
        int get_verbosely = 0;
        int get_recover = 0;
 
+       prefix = setup_git_directory();
+
        git_config(git_default_config, NULL);
 
        while (arg < argc && argv[arg][0] == '-') {

Lets analyze this patch line by line.

The first line
```
diff --git a/builtin-http-fetch.c b/http-fetch.c
```
is a "git diff" header in the form diff --git a/file1 b/file2. The a/ and b/ filenames are the same unless rename/copy is involved (like in our case). The --git is to mean that diff is in the "git" diff format.
Next are one or more extended header lines. The first three
```
similarity index 95%
rename from builtin-http-fetch.c
rename to http-fetch.c
```
tell us that the file was renamed from builtin-http-fetch.c to http-fetch.c and that those two files are 95% identical (which was used to detect this rename).

The last line in extended diff header, which is
```
index f3e63d7..e8f44ba 100644
```
tell us about mode of given file (100644 means that it is ordinary file and not e.g. symlink, and that it doesn't have executable permission bit), and about shortened hash of preimage (the version of file before given change) and postimage (the version of file after change). This line is used by git am --3way to try to do a 3-way merge if patch cannot be applied itself.
Next is two-line unified diff header
```
--- a/builtin-http-fetch.c
+++ b/http-fetch.c
```
Compared to diff -U result it doesn't have from-file-modification-time nor to-file-modification-time after source (preimage) and destination (postimage) file names. If file was created the source is /dev/null; if file was deleted, the target is /dev/null.
If you set diff.mnemonicPrefix configuration variable to true, in place of a/ and b/ prefixes in this two-line header you can have instead c/, i/, w/ and o/ as prefixes, respectively to what you compare; see git-config(1)
Next come one or more hunks of differences; each hunk shows one area where the files differ. Unified format hunks starts with line like
```
@@ -1,8 +1,9 @@
```
or
```
@@ -18,6 +19,8 @@ int cmd_http_fetch(int argc, const char **argv, ...
```
It is in the format @@ from-file-range to-file-range @@ [header]. The from-file-range is in the form -<start line>,<number of lines>, and to-file-range is +<start line>,<number of lines>. Both start-line and number-of-lines refer to position and length of hunk in preimage and postimage, respectively. If number-of-lines not shown it means that it is 1.

The optional header shows the C function where each change occurs, if it is a C file (like -p option in GNU diff), or the equivalent, if any, for other types of files.

Next comes the description of where files differ. The lines common to both files begin with a space character. The lines that actually differ between the two files have one of the following indicator characters in the left print column:
'+' -- A line was added here to the first file.
'-' -- A line was removed here from the first file.

So, for example, first chunk

     #include "cache.h"
     #include "walker.h"
     
    -int cmd_http_fetch(int argc, const char **argv, const char *prefix)
    +int main(int argc, const char **argv)
     {
    +       const char *prefix;
            struct walker *walker;
            int commits_on_stdin = 0;
            int commits;

means that cmd_http_fetch was replaced by main, and that const char *prefix; line was added.

In other words, before the change, the appropriate fragment of then 'builtin-http-fetch.c' file looked like this:

    #include "cache.h"
    #include "walker.h"
    
    int cmd_http_fetch(int argc, const char **argv, const char *prefix)
    {
           struct walker *walker;
           int commits_on_stdin = 0;
           int commits;

After the change this fragment of now 'http-fetch.c' file looks like this instead:

    #include "cache.h"
    #include "walker.h"
     
    int main(int argc, const char **argv)
    {
           const char *prefix;
           struct walker *walker;
           int commits_on_stdin = 0;
           int commits;

There might be
```
\ No newline at end of file
```
line present (it is not in example diff).

As Donal Fellows said it is best to practice reading diffs on real-life examples, where you know what you have changed.

References:

git-diff(1) manpage, section "Generating patches with -p"
(diff.info)Detailed Unified node, "Detailed Description of Unified Format".

Why does `git` need to use a similarity index to detect renames? — Geremia, Jul 07 '16 at 01:51
And `[header]` is the name of the function in which the diff occurs? — Geremia, Jul 07 '16 at 01:52
@Geremia: Git uses similarity-based heuristics for rename detection... and also for code move and copy detection in `git blame -C -C`, that's how it works; it is Git design decision. The git diff format just shows the similarity (or dissimilarity) index to the user. — Jakub Narębski, Jul 19 '16 at 15:14
@Geremia: To be more exact, `[header]` is the closest preceding like with the beginning of function that precedes a hunk. In most cases this line includes the name of the function in which chunk of diff is. This is configurable with `diff` gitattribute set to diff driver, and diff driver including `xfuncname` configuration variable. — Jakub Narębski, Jul 25 '16 at 16:29
This is an excellent and thoroughly comprehensive answer. I upvoted it months ago but I’ve been re-reading it to consolidate my understanding. I’d like to query one sentence: *“If number-of-lines not shown it means that it is 0.”* If the number of lines changed is zero, I would’ve thought that there simply wouldn’t be any hunk. With [GNU diff’s unified format](https://www.gnu.org/software/diffutils/manual/html_node/Detailed-Unified.html), *“If a hunk contains just one line, only its start line number appears”*. I’d imagine that the same would be true for git’s diff. — Anthony Geoghegan, Nov 25 '16 at 17:53
@AnthonyGeoghegan: lines might be deleted (then number of lines in postimage is 0) , or added (then number of lines in preimage is 0). — Jakub Narębski, Nov 26 '16 at 00:55
@JakubNarębski from this `@@ -1,8 +1,9 @@` is it possible to interpret what actually has happened. for example 1) one line have been added 2) one line is being modified and one line being added and so on. Or is it from another way, as there should be way to get them as git diff correclty identifies what lines have been modified in the code. Please help me as I really need to get this sorted out — Kasun Siyambalapitiya, Nov 28 '16 at 12:48
@JakubNarębski Thanks for the response; I’ll clarify my comment: In cases where `` in the pre or post-image is 0, the range information is specified as `,0` (I verified this with a few experiments). However, if the number of lines in the pre or post-image is 1 (e.g., changing a single-line file), then the `` is omitted from the range information and only the `` is displayed (no comma). In the example of changing a single-line file, the hunk range information will be `@@ -1 +1 @@` (indicating that both `` is 1). — Anthony Geoghegan, Nov 29 '16 at 12:08
@KasunSiyambalapitiya: Unified diff format that Git uses (as opposed to context diff format^[1]) does not distinguish between modified line, and removed and added line. [1]: https://www.gnu.org/software/diffutils/manual/html_node/Context-Format.html — Jakub Narębski, Nov 29 '16 at 21:54
@JakubNarębski: The number of lines defaults to 1, not to 0. It is as simple as that. In practice, it only appears as "-1" and/or "+1" for single-line files because there is no context to display. — Guido Flohr, Nov 17 '17 at 14:20
@JakubNarębski: There are zero or more hunks of differences, not one or more: `> empty && git add empty && git diff --cached` — Guido Flohr, Nov 17 '17 at 14:23
Is there a standard file extension for files written in this format? Never mind, I found my answer: [Is there a commonly-used filename extension for unified diff format?](https://stackoverflow.com/q/1260753/27211) — kiewic, Aug 08 '19 at 17:20

Ciro Santilli OurBigBook.com · Answer 2 · 2023-04-12T18:37:11.647

@@ -1,2 +3,4 @@ part of the diff

This part took me a while to understand, so I've created a minimal example.

The format is basically the same the diff -u unified diff.

For instance:

diff -u <(seq 16) <(seq 16 | grep -Ev '^(2|3|14|15)$')

Here we removed lines 2, 3, 14 and 15. Output:

@@ -1,6 +1,4 @@
 1
-2
-3
 4
 5
 6
@@ -11,6 +9,4 @@
 11
 12
 13
-14
-15
 16

@@ -1,6 +1,4 @@ means:

-1,6 means that this piece of the first file starts at line 1 and shows a total of 6 lines. Therefore it shows lines 1 to 6.
```
1
2
3
4
5
6
```
- means "old", as we usually invoke it as diff -u old new.
+1,4 means that this piece of the second file starts at line 1 and shows a total of 4 lines. Therefore it shows lines 1 to 4.

+ means "new".

We only have 4 lines instead of 6 because 2 lines were removed! The new hunk is just:
```
1
4
5
6
```

@@ -11,6 +9,4 @@ for the second hunk is analogous:

on the old file, we have 6 lines, starting at line 11 of the old file:
```
11
12
13
14
15
16
```
on the new file, we have 4 lines, starting at line 9 of the new file:
```
11
12
13
16
```
Note that line 11 is the 9th line of the new file because we have already removed 2 lines on the previous hunk: 2 and 3.

Hunk header

Depending on your git version and configuration, you can also get a code line next to the @@ line, e.g. the func1() { in:

@@ -4,7 +4,6 @@ func1() {

This can also be obtained with the -p flag of plain diff.

Example: old file:

func1() {
    1;
    2;
    3;
    4;
    5;
    6;
    7;
    8;
    9;
}

If we remove line 6, the diff shows:

@@ -4,7 +4,6 @@ func1() {
     3;
     4;
     5;
-    6;
     7;
     8;
     9;

Note that this is not the correct line for func1: it skipped lines 1 and 2.

This awesome feature often tells exactly to which function or class each hunk belongs, which is very useful to interpret the diff.

How the algorithm to choose the header works exactly is discussed at: Where does the excerpt in the git diff hunk header come from?

One line hunk summarized notation

This is very rare, but consider:

diff -U0 <(seq -w 16) <(seq -w 16 | sed 's/10/hack/')

where:

-U0: use 0 lines of context
second file replaces 10 with hack

The diff output in that case is:

@@ -10 +10 @@
-10
+hack

So we understand that when there's a single line change, the notation gets summarized to showing just one number instead of the m,n pair.

This behavior is documented in the documentation quoted by Todd's answer:

If a hunk contains just one line, only its start line number appears. Otherwise its line numbers look like start,count. An empty hunk is considered to start at the line that follows the hunk.

And single line hunk addition and removal look like this, removal:

diff -U0  <(seq -w 16) <(seq -w 16 | grep -Ev '^(10)$')

output:

@@ -10 +9,0 @@
-10

addition:

$ diff -U0 <(seq -w 16 | grep -Ev '^(10)$') <(seq -w 16)

output:

@@ -9,0 +10 @@
+10

Tested on diff 3.8, Ubuntu 22.10.

This is for anyone who still didn't quite understand. In `@@ -1,6 +1,4 @@` pls don't read `-1` as `minus one` or `+1` as `plus one` instead read this as `line 1 to 6` in old (first) file. Note here `- implies "old"` not minus. BTW, thanks for clarification... haash. — dkjain, Jul 02 '16 at 12:48
from this @@ -1,8 +1,9 @@ is it possible to interpret what actually has happened. for example 1) one line have been added 2) one line is being modified and one line being added and so on. Or is it from another way, as there should be way to get them as git diff correclty identifies what lines have been modified in the code. Please help me as I really need to get this sorted out — Kasun Siyambalapitiya, Nov 28 '16 at 12:50

score 34 · Answer 3 · edited Oct 21 '21 at 12:37

Here's the simple example.

diff --git a/file b/file 
index 10ff2df..84d4fa2 100644
--- a/file
+++ b/file
@@ -1,5 +1,5 @@
 line1
 line2
-this line will be deleted
 line4
 line5
+this line is added

Here's an explanation:

--git is not a command, this means it's a git version of diff (not unix)
a/ b/ are directories, they are not real. it's just a convenience when we deal with the same file (in my case a/ is in index and b/ is in working directory)
10ff2df..84d4fa2 are blob IDs of these 2 files
100644 is the “mode bits,” indicating that this is a regular file (not executable and not a symbolic link)
--- a/file +++ b/file minus signs shows lines in the a/ version but missing from the b/ version; and plus signs shows lines missing in a/ but present in b/ (in my case --- means deleted lines and +++ means added lines in b/ and this the file in the working directory)
@@ -1,5 +1,5 @@ in order to understand this it's better to work with a big file; if you have two changes in different places you'll get two entries like @@ -1,5 +1,5 @@; suppose you have file line1 ... line100 and deleted line10 and add new line100 - you'll get:

@@ -7,7 +7,6 @@ line6
 line7
 line8
 line9
-this line10 to be deleted
 line11
 line12
 line13
@@ -98,3 +97,4 @@ line97
 line98
 line99
 line100
+this is new line100

Thanks. "100644 is the mode bits, indicating that this is a regular file (not executable and not a symbolic link)". Is "mode bits" a concept in Linux, or just in Git? — Tim, Jan 11 '19 at 14:12
@Tim Not specific to git. The right 3 digits (`644`) are to be read in octal (values: 1, 2, 4 respectively eXecute, Write, and Read permission) and corresponds in that order to Owner (User), then Group, then Other permissions. So in short `644` would mean if written symbolicaly `u=rw,og=r`, that is readable to everyone but writable only by owner. The other digits on the left encode other information, like if it is a symlink, etc. Values can be seen https://github.com/git/git/blob/6d5b26420848ec3bc7eae46a7ffa54f20276249d/git-compat-util.h#L669, the first 1 in this position is "regular file". — Patrick Mevzek, Aug 30 '19 at 18:29

score 17 · Answer 4 · answered Mar 27 '10 at 14:33

The default output format (which originally comes from a program known as diff if you want to look for more info) is known as a “unified diff”. It contains essentially 4 different types of lines:

context lines, which start with a single space,
insertion lines that show a line that has been inserted, which start with a +,
deletion lines, which start with a -, and
metadata lines which describe higher level things like which file this is talking about, what options were used to generate the diff, whether the file changed its permissions, etc.

I advise that you practice reading diffs between two versions of a file where you know exactly what you changed. Like that you'll recognize just what is going on when you see it.

+1: The suggestion about practice is a very good one - probably much faster than trying to obsessively read documentation. — Cascabel, Mar 27 '10 at 14:47

stefanB · Answer 5 · 2010-03-27T14:23:36.780

On my mac:

info diff then select: Output formats -> Context -> Unified format -> Detailed Unified :

Or online man diff on gnu following the same path to the same section:

File: diff.info, Node: Detailed Unified, Next: Example Unified, Up: Unified Format

Detailed Description of Unified Format ......................................

The unified output format starts with a two-line header, which looks like this:
 --- FROM-FILE FROM-FILE-MODIFICATION-TIME
 +++ TO-FILE TO-FILE-MODIFICATION-TIME
The time stamp looks like `2002-02-21 23:30:39.942229878 -0800' to indicate the date, time with fractional seconds, and time zone.

You can change the header's content with the `--label=LABEL' option; see *Note Alternate Names::.

Next come one or more hunks of differences; each hunk shows one area where the files differ. Unified format hunks look like this:
 @@ FROM-FILE-RANGE TO-FILE-RANGE @@
  LINE-FROM-EITHER-FILE
  LINE-FROM-EITHER-FILE...
The lines common to both files begin with a space character. The lines that actually differ between the two files have one of the following indicator characters in the left print column:

`+' A line was added here to the first file.

`-' A line was removed here from the first file.

Note that git doesn't print the 'XXX-FILE-MODIFICATION-TIME' part, as it doesn't make sense for version control system. For comparing files on filesystem timestams can function as "poor man" version control. — Jakub Narębski, Mar 27 '10 at 15:11

score 4 · Answer 6 · edited May 23 '17 at 12:18

It's unclear from your question which part of the diffs you find confusing: the actually diff, or the extra header information git prints. Just in case, here's a quick overview of the header.

The first line is something like diff --git a/path/to/file b/path/to/file - obviously it's just telling you what file this section of the diff is for. If you set the boolean config variable diff.mnemonic prefix, the a and b will be changed to more descriptive letters like c and w (commit and work tree).

Next, there are "mode lines" - lines giving you a description of any changes that don't involve changing the content of the file. This includes new/deleted files, renamed/copied files, and permissions changes.

Finally, there's a line like index 789bd4..0afb621 100644. You'll probably never care about it, but those 6-digit hex numbers are the abbreviated SHA1 hashes of the old and new blobs for this file (a blob is a git object storing raw data like a file's contents). And of course, the 100644 is the file's mode - the last three digits are obviously permissions; the first three give extra file metadata information (SO post describing that).

After that, you're on to standard unified diff output (just like the classic diff -U). It's split up into hunks - a hunk is a section of the file containing changes and their context. Each hunk is preceded by a pair of --- and +++ lines denoting the file in question, then the actual diff is (by default) three lines of context on either side of the - and + lines showing the removed/added lines.

++ for the `index` line. Confirmed with `git hash-object ./file` — Ciro Santilli OurBigBook.com, Jul 25 '15 at 09:32

How to read the output from git diff?

6 Answers6

Linked

Related