Does git similarity index 75% mean git thinks I have renamed a file?

Question

I am using GitExtensions with Visual Studio and when go to commit my change, it says I have added two new files and has a third file (a .resx file) which it seems to be comparing with another .resx file and it says they have similarity index 75%

The files are not related, but a large portion of the file is standard template that is in all .resx files so I can understand them being treated as similar.

So question is - Does this message mean that git thinks I have renamed the older file and will it mess up if I continue with the commit as is?

score 4 · Answer 1 · edited May 23 '17 at 12:24

Git does not store diffs.¹ Instead, each commit stores complete files (as listed in the index-at-the-time-the-commit-is-made), as a sort of stand-alone entity. To retrieve a previous commit, git simply finds the commit ID and extracts the associated files.²

The "similarity index" and any presentation of "a file was renamed" or "a file was copied" are simply git guessing at what happened, in an attempt to make things clearer to the human, or present the shortest way to get from one commit to another, for instance. You are correct that the template match is misleading git at this point, but "this point" is the "presentation to user of how to get from Point A to Point B", not "what was or will be stored".

The git status command—presumably Visual Studio, which I've never used, just runs git status for you—makes git produce a new comparison, this time "most recent/current commit" (HEAD) vs "current index", i.e., "what will be committed if you commit now". In fact, you actually get two comparisons: HEAD-vs-index, and index-vs-work-tree. This gets you git's best guess at what happened—including computing that similarity index, so that it can guess whether some file(s) were renamed.

Note that once you have any two given commits to git diff, you can specify different copy and/or rename thresholds to get "what happened" shown to you in different ways. Git does this on demand, by extracting (mostly in-memory) the two commits, comparing them, computing each similarity index (again) at that time, and making its best guess at copies or renames from there.

¹This glosses over git's "pack" files, which do use deltas. However, pack files are generally constructed long after a commit (or series of commits). New commits always make new, stand-alone object files, which may be packed and re-packed in various ways later.

²To speed up operation, git will use the current index (cache) information to figure out a quick way to change from "commit currently checked out" (as noted by the index/cache) to "new commit to be checked out" (given as an argument to git checkout). In particular, as long as you have not modified your work-tree so that the index is current, this allows git checkout to avoid touching or even inspecting most files when switching between similar branches or commits.

You don't need to worry about either of these footnotes, though: it's all handled automatically, behind the scenes. (Footnote two can come into play when you start using --work-tree= arguments, as people do in fancy auto-deployment scripts with bare repositories on servers. However, even there it usually just works, all automatically.)

score 0 · Answer 2 · answered Jul 07 '15 at 14:29

0

Git does not calculate diff based on the similarity index. Instead it will store the hash value for the files.

TL;DR: You can commit as is without worrying about git thinking you simply renamed the file.

answered Jul 07 '15 at 14:29

2016rshah

671
6
19

Does git similarity index 75% mean git thinks I have renamed a file?

2 Answers2