Git incorrectly infers "rename"

Question

Git has inferred a "rename" when I had no desire for it to do so (this question is effectively the opposite of, say, How to make git mark a deleted and a new file as a file move?):

I created a new file, and did git add. (I did not do a git commit, as I have no desire to do so at this stage.)
Later on I did git rm on another file.
Now git status reports renamed: old-file -> new-file. I have not committed yet.

The two files are in the same directory, have similar-ish names and a certain amount of common content. However, I deliberately did not do a git mv, as this is not a rename, I want the two files tracked separately. If I had wanted a rename I would have done a git mv rather than my deliberate git add/git rm.

What about the activity has caused git to decide it's a rename, and can it be told not to try to infer things I don't intend?

Could you put the exact steps in your question to reproduce it maybe? — ckruczek, Nov 14 '17 at 09:10
Yes true, but maybe you can create a simple, reproducable example for us, with similar file content etc. — ckruczek, Nov 14 '17 at 09:13

Oliver Charlesworth · Answer 1 · 2017-11-14T09:31:03.107

4

Git's logical underlying storage model only stores the repo contents before and after a change, not the change itself. So it has no way of distinguishing between, say, a move+modification and a delete+add.

Thus git mv is just convenience syntax for:

mv a b
git rm a
git add b

git status is merely inferring the most likely cause of the underlying change (given before and after), in an effort to make the human-readable output useful. There are certainly pathological edge cases - in your particular case it's inferred that the change was caused by a move and a small content change.

Update based on comments discussion: If you need to make it clear what's going on here, you could (as you suggested) perform the add and rm in separate commits. This has the downside of splitting a single "logical" commit into two, though that may be unimportant.

edited Nov 14 '17 at 09:31

answered Nov 14 '17 at 09:01

Oliver Charlesworth

267,707
33
569
680

1

Do you think it's possible that the similar content had anything to do with it? – Tim Biegeleisen Nov 14 '17 at 09:07
@JonBrave - Ah, I missed that. I'll update my answer shortly, but the basic story is still the same - `git status` is inferring the most probable cause of the changes it observes. – Oliver Charlesworth Nov 14 '17 at 09:14
I presume that what `git status` reports will indeed be what actually gets recorded when I `git commit/push`(?) In that case, the answer should state what I can do to avoid this behaviour in my case, e.g. if I had done a `git commit` after the `add` and before the `rm`would that have prevented this undesired behaviour? – JonBrave Nov 14 '17 at 09:16
1

@JonBrave - I guess the key point is that these are literally indistinguishable as far as Git's concerned - the commit/push will be the same. There are certainly tricks you can pull to affect the human-readable log output, but generally at the expense of something else (in your example, two commits rather than one). – Oliver Charlesworth Nov 14 '17 at 09:28

score 2 · Accepted Answer · answered Nov 14 '17 at 09:21

2

The files are similar enough so that git status thinks that it was a rename. Under the hood, it makes no difference at all, but if you want to ensure that it doesn't happen, make separate commits where you add and delete the files:

git add newfile
git commit
git rm oldfile
git commit

answered Nov 14 '17 at 09:21

1615903

32,635
12
70
99

It is *probable/possible* in my case that when I created `newfile` prior to `git add` I had copied the old file's content into the new file, and so they were identical or very similar. To avoid the extra `commit`, if in future I _first_ make `newfile` empty, then `git add`, and only then copy the old content into it, would this avoid the "rename" inference and thus the need for the intermediate `commit`, or is the inference made not initially but dynamically depending on what the content is at a later date? – JonBrave Nov 14 '17 at 09:42
2

As Oliver Charlesworth said, Git does this rename detection dynamically, any time it is comparing two commits *and* you have enabled rename detection (`--find-renames[=]`). The default in Git used to be to leave rename detection disabled except for `git merge` and `git status`. Since Git 2.12ish, the default is now to have rename detection enabled. The optional number is the *similarity index*, between 0 and 100. 100 means the files must be 100% identical. Note that you can set this value for `git diff` and `git merge`, but `git status` hard-codes it to 50%. – torek Nov 14 '17 at 09:45
@torek Oh, this `--find-renames` is interesting! Though probably not suitable as it must be enabled/disables at `commit`/`push` time, and presumably acts across all files, where I wanted to get this done once & for all at the `add`/`rm` stage. You are also saying that `git status` might report differently (because of `--find-renames`) from what will *actually* be decided upon at `commit`/`push`-time, right? You *might* care to put in your own solution post explaining that aspect, as it could be helpful to others reading this question? – JonBrave Nov 14 '17 at 09:50
1

The rename detection changes nothing about what Git *stores*, only about what it *displays*. Try committing what you have, then run `git diff --find-renames=1 HEAD^ HEAD`, `git diff --find-renames=100 HEAD^ HEAD`, and `git diff --no-renames HEAD^ HEAD`. Note the different output, yet each commit is totally the same each time! (Remember, each commit is a totally independent snapshot, of *all* files. Commands like `git show` work by comparing the commit to its parent: they run `git diff`, and hence allow you to specify the rename detection enable/level.) – torek Nov 14 '17 at 09:52
@torek Ah, so the "rename" is not truly stored as such in the repository (I assumed it would be), it is purely a "dynamic reporting hint", and would not show up as such when viewing the git version history tree? Assuming so, I really think you might want to put in a post explaining this and the `--find-renames`, this is very useful yet non-obvious information. – JonBrave Nov 14 '17 at 09:56
It is indeed highly non-obvious, at least (or maybe especially) if you have used most other version control systems, most of which *do* track renames at the file level. – torek Nov 14 '17 at 09:57
This answer is wrong, at least with git 2.28.0. The case is Git detects renames dynamically when two files at different paths are sufficiently similar, **even when the deletion and the addition is in different commits**. For Git what matters is content similarity between a deletion and an addition, irrespective of commits, as long as the deletion and addition appear in some computed diff. The only way to avoid this is to turn off rename detection when running a Git command. – mljrg Oct 21 '20 at 22:39

score 1 · Answer 3 · answered Oct 21 '20 at 22:55

Git infers renames when it detects deletions and additions of similar content (by default 50% similar) whenever diffs are computed between any pair of commits in the history. That is, Git does not record a deletion and addition as a rename, so it is irrelevant if you do git rm ... and git add ..., or simply git mv ... which is basically an alias of the other two.

Git infers renames, except if you tell it to not do so, but beware: if you tell Git to not infer renames, then it will not infer any, even those deletions and additions that you want to be paired as renames.

For more details, I suggest reading the documentation of gitdiffcore.

Git incorrectly infers "rename"

3 Answers3