What mechanism does git follow while diff
ing files? How is git able to tell me the difference between the two files(two commits, etc)?
Does it follow any key-value pair mechanism where line number is the key and value is the hash and if the hash changes, the line is marked as the changed line?

- 953
- 2
- 14
- 31
-
`git` is free software, so you could study its source code... – Basile Starynkevitch Jan 09 '19 at 14:15
-
Can you show an example of a diff that git has produced for binary files? As far as I know, once git detects that the files are binary in nature, it won't produce any diffs for them. – Lasse V. Karlsen Jan 09 '19 at 14:17
-
Or are you simply asking how git can tell that the files are different? – Lasse V. Karlsen Jan 09 '19 at 14:17
-
@LasseVågsætherKarlsen ok, i would like to know how does it show the difference in two files if not binary. – aviral sanjay Jan 09 '19 at 14:18
-
1Please clean up your question, right now I'm not entirely sure what you're really asking about. – Lasse V. Karlsen Jan 09 '19 at 14:19
-
@LasseVågsætherKarlsen check now – aviral sanjay Jan 09 '19 at 14:24
-
You can tell git how to convert the binary to text and diff that result. This answer may help: https://superuser.com/questions/706042/how-can-i-diff-binary-files-in-git (look at the "bin" for a generic binary file) – John Szakmeister Jan 09 '19 at 14:28
-
I still don't understand your question, you keep saying binary files, and then you talk about "the line", I would still like to see an example of such a diff. You don't get diff from git if it thinks the file is binary, only if it thinks the file is text (non-binary). – Lasse V. Karlsen Jan 09 '19 at 14:30
-
In other words, the only answer that can be provided to your answer right now is "it doesn't, git does not diff binary files". – Lasse V. Karlsen Jan 09 '19 at 14:30
-
I had changed the context to files and not binary files, check the question. – aviral sanjay Jan 09 '19 at 14:35
-
The second question still mentions binary files. – John Szakmeister Jan 09 '19 at 14:40
-
@JohnSzakmeister done – aviral sanjay Jan 09 '19 at 14:45
-
Take a look at https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/. Git uses the Myers diffing strategy by default, but it can also use the patience algorithm: https://blog.jcoglan.com/2017/09/19/the-patience-diff-algorithm/ (and here's the creator of the algorithms description: https://bramcohen.livejournal.com/73318.html) – John Szakmeister Jan 09 '19 at 14:53
2 Answers
Your original question asked about binary files, which in Git, means "files that Git has decided are not text". For such files, unless you provide a special diff driver, Git does not attempt to generate a diff, it only says "these two files are the same" or "these two files are different". (A diff driver is an external program: you can instruct Git to run this program instead, and this program can do whatever it wants to do with the pair of files, to generate a useable diff.)
Your updated question, at least as of this time, asks about diffing text files. Git has built into it a modified version of LibXDiff. The main algorithm here is due to Eugene Myers. See also Myers diff algorithm vs Hunt–McIlroy algorithm. For a somewhat more user-friendly introduction to diff algorithms, see the last section of chapter 3 of my stalled book. You are in fact onto something with the idea of line hashes: these diff algorithms compare symbols, and using a line-hash as the symbols in the diff matrix is how they find line-by-line diffs.

- 448,244
- 59
- 642
- 775
Probably it generates a checksum of the file and compare those, if they are different the file will be marked as modified, but it will not tell you the difference because simply it doesn't know it.

- 9,644
- 4
- 30
- 35