0

On a maintenance branch, I have moved a source folder for an entire website using git mv old new. At the time, I saw git status show many renames, so I proceeded to commit and develop work on top of that (including several further renames).

Going back to look closer at the commit, I can see in this mv commit that several renames have been detected, but not all, so many of my files appear to have lost their history. This is odd as I didn't have any modifications done, my commit was purely the directory rename.

I've since merged this branch into another one, and after going through a long merge process, I got everything sorted and checked in. I decided to check the history of a file only to notice it only goes as far back as the initial rename.

I've pushed all these changes to remote. I'm working alone here so I can only conflict with myself. It'd be a pain to lose this project's history on a per-file basis but it'd also be a huge pain to go back and redo all this including the merge.

Is there a way to go back and correct this initial rename without having to go through the huge merge that I've already done?

Steven Sproat
  • 4,398
  • 4
  • 27
  • 40

2 Answers2

1

There is nothing to correct.

Each Git commit holds a snapshot of all files. These files each have names. If you rename a file and commit, then compare the old and new snapshots, what you get is:

old snapshot       new snapshot
------------       ------------
README.md          README.md
file.ext           file.ext
oldname.ext
                   newname.ext
zfile.ext          zfile.ext

As you can see, the difference between "old snapshot" and "new snapshot" is that the old file oldname.ext has been deleted, and a totally different new file newname.ext has been added.

What Git does with such a snapshot is to detect renames. For every file on the left that's deleted, and every file on the right that is newly added, Git puts the names into a pile of "potential renames". Then it looks through the piles. Here, there's only one pair of names to maybe pair up. Git will look at the contents of oldfile.ext in the old commit and newfile.ext in the new commit. If the contents match, or are close enough, Git says: Aha, the file is renamed! And instead of "delete oldfile, add newfile" you get "rename oldfile to newfile".

Different versions of Git have slightly different rename-detectors. The main difference is how many names can go into the queues. The default for extremely old versions of Git is:

  • don't detect renames at all unless told to;
  • use a maximum of 100 name pairs.

Modern Git has "detect renames" turned on by default and has a default maximum of 400 name pairs, with the limit last having been raised in Git 1.7.5.

You can raise the limit further yourself: there is a configuration variable, diff.renameLimit, which defaults to using the built in default. Set this to any value you like in your configuration file. Setting it to zero tells Git to try as hard as it can.

What this means is that two git diff runs, on the same commits in the same repository, can give different answers about which files were renamed, and which were just deleted-and-added. You choose, at git diff time, which way you'd like the diff to be computed and displayed. The underlying commits just have the old name (in the old commit) and the new name (in the new commit): nothing here changes in any way, and redoing the commits will not help. What matters are the various control knobs:

  • Is the rename detector enabled at all? (diff.renames)
  • What's the limit on the number of file names? (diff.renameLimit)
  • What's the threshold for deciding that two files with different names, but similar—yet not 100% identical—content, are "the same file"? This last one is set by the -M argument to your git diff command. Passing in -M sets diff.renames to true even if your configuration would otherwise have it false.

The git status command runs with rename detection enabled, and the similarity threshold set to "50% similar". There is no configuration or command-line knob to change the similarity value. Until recently there was no configuration setting to disable rename detection here, but Git 2.18 added this, along with a status.rename configuration setting, as described in the release notes.

(There is also a fancier directory-rename detector algorithm, that was just experimental for a while, that I think is still in since Git 2.18, but it's not described very well. From the documentation, it seems to be used only during merges, rather than diffs, which is odd since merges actually run diffs internally. See the first bullet point in the above release notes.)

torek
  • 448,244
  • 59
  • 642
  • 775
  • This 'diff.renameLimit' sounds like my problem - with my initial rename, this would have touched 5,000+ files which explains the inconsistency I am seeing where some files are flagged as renamed, but others are added/deleted. I think at this point I will need to set this config value and go back in my branch to before the rename, and start again. – Steven Sproat Feb 12 '20 at 10:44
  • 1
    There's no need to change any of your existing commits! The *commits* are not affected by the rename limit. The rename detector runs *after* you make the commits. – torek Feb 12 '20 at 10:47
  • Oh my, I misunderstood! I see - I've increased my limits, and yup - I can see the history being tracked correctly. I was thinking that I'd have to increase my limit because our cloud git (bitbucket)'s "diff viewer" was showing these add/deletions, but I assume that's because of ITS renameLimit value. That's great, I'd be checking history locally anyway. @torek thank you so much for your answer, very helpful :) – Steven Sproat Feb 12 '20 at 11:14
0

It may not make much sense, but in my PhpStorm IDE, files renamed via the IDE itself are marked as "renamed" and files renamed via "git mv" are marked as deleted and re-added: screenshot

To bring all files into one state, I just ran in terminal:

git add .