What algorithm does git diff use to detect similar (copied/renamed) files?

Question

What is the complexity with respect to the number of files and the size of files in the repository?
Does whitespace matter? (e.g. indenting a whole file)
Does it work on text only?
Is there a risk of similarity detection timing out? E.g. in a large repository does changing the similarity index (--find-renames/-M) to a low number risk not finding results that a high index number might have found because more files were considered?)

(Note, this seems to have been the intent of a previous question 6 years ago, but the accepted answer eshewed algorithm discussion.)

In which context, detecting file renames that also introduce modifications? — user229044, Jan 10 '18 at 23:17
I suppose to find out whether the files have been modified or not — freude, Jan 11 '18 at 00:16
If you're looking for the computation that feeds into the, e.g., `R89` status from `git diff --find-renames --name-status commit1 commit2` output, see https://stackoverflow.com/a/46258968/1256452. (There's more that occurs before this point, though, which I have outlined in other answers.) — torek, Jan 11 '18 at 00:56
If you're looking for the algorithm Git uses to pair up files to do rename detection at all (between various source files), see, e.g., https://stackoverflow.com/a/40352403/1256452 (I think I have other answers that go a bit deeper into the details, this is just the first one I turned up in a search). — torek, Jan 11 '18 at 00:59
Related to git diff with similarity `--find-renames`/`-M` for files that move with slight changes, as is common with e.g. C++ files that are moved and need small updates to include paths. Thanks for link to other answers, both useful reads, https://stackoverflow.com/a/46258968/1256452 most so. The curiosity starts when trying to do code reviews and being frustrated when moved files appear in a diff as deleted/added instead of a renamed file with small deltas. Beyond understanding, it would be good to know if any care can be taken to increase the chance a reviewer gets a clean diff. — Vincent Scheib, Jan 12 '18 at 04:37

0 Answers0