What algorithm does git diff use to detect similar (copied/renamed) files?
- What is the complexity with respect to the number of files and the size of files in the repository?
- Does whitespace matter? (e.g. indenting a whole file)
- Does it work on text only?
- Is there a risk of similarity detection timing out? E.g. in a large repository does changing the similarity index (
--find-renames
/-M
) to a low number risk not finding results that a high index number might have found because more files were considered?)
(Note, this seems to have been the intent of a previous question 6 years ago, but the accepted answer eshewed algorithm discussion.)