-2

What mechanism does git follow while diffing files? How is git able to tell me the difference between the two files(two commits, etc)? Does it follow any key-value pair mechanism where line number is the key and value is the hash and if the hash changes, the line is marked as the changed line?

aviral sanjay
  • 953
  • 2
  • 14
  • 31

2 Answers2

3

Your original question asked about binary files, which in Git, means "files that Git has decided are not text". For such files, unless you provide a special diff driver, Git does not attempt to generate a diff, it only says "these two files are the same" or "these two files are different". (A diff driver is an external program: you can instruct Git to run this program instead, and this program can do whatever it wants to do with the pair of files, to generate a useable diff.)

Your updated question, at least as of this time, asks about diffing text files. Git has built into it a modified version of LibXDiff. The main algorithm here is due to Eugene Myers. See also Myers diff algorithm vs Hunt–McIlroy algorithm. For a somewhat more user-friendly introduction to diff algorithms, see the last section of chapter 3 of my stalled book. You are in fact onto something with the idea of line hashes: these diff algorithms compare symbols, and using a line-hash as the symbols in the diff matrix is how they find line-by-line diffs.

torek
  • 448,244
  • 59
  • 642
  • 775
2

Probably it generates a checksum of the file and compare those, if they are different the file will be marked as modified, but it will not tell you the difference because simply it doesn't know it.

NiVeR
  • 9,644
  • 4
  • 30
  • 35