4

Problem: I need to change the location of entire directories in a repository. To do that I use git mv and, if it's needed, I change the include header names to the currently proper ones. The problem is when I takes both actions during one commit. In that situation the file history is missing (git consider this as deleting and making new files).

Workaround: If I split those actions to separate commits the issue does not occur.

The problem reappears: However, even when I use the above solution the problem returns during the merging with master. I am obligated to use no-ff merge only. In this situation the new commit to master branch is made of changes from both commits and... history is not tracking properly anyway.

Another ugly workaround: I can deliver those commits separately to master. I cannot deliver uncompilable code but if I exclude it from the building process it could be doable... But it is ugly and so wrong...

I am wondering if there is a better solution to this problem.

  • Another uglier workaround: move the files/folders in the entire history. The caveats are: 1) you lose track of the original directory structure and 2) you need to push-force. Look at https://stackoverflow.com/questions/3142419/how-can-i-move-a-directory-in-a-git-repo-for-all-commits if that might be applicable. – Fabien Bouleau Nov 30 '17 at 10:45
  • When you'd like to see the history of a file by `git log -- `, add `--follow`. – ElpieKay Nov 30 '17 at 10:49
  • @FabienBouleau What... How is that even possible? I can rewrite history for all commits, but i can't force git to tracking the file history properly like in hg or svn... Wonderful designed tool – guy_from_nowhere Nov 30 '17 at 12:46
  • I’m not saying it is not possible, the proof is in @ElpieKay ‘s answer - partially since it applies to a specific file, not to a path. Rewriting the history is only yet another solution to your problem. Git is an evolving tool. The information is in the database. Only this feature has not been implemented yet imho. – Fabien Bouleau Nov 30 '17 at 12:50
  • @FabienBouleau To avoid be misunderstood, I'm very grateful for your suggestion. I'm just terrified how many possibilities to rewrite history are implemented in git (instead of features which can helps in co-work :P ). I wondering why --follow is not default log option... – guy_from_nowhere Nov 30 '17 at 13:23
  • @guy_from_nowhere No offense taken, no worry. The reason why —follow applies to a file only is I think that on one hand hit works on files only and on the other hand how would you follow the history of a path where files were move from different folders? – Fabien Bouleau Nov 30 '17 at 14:05
  • @guy_from_nowhere: `--follow` is poorly implemented, it works on only one file name at a time (you cannot `--follow` an entire directory full of files for instance) and does not work well when tracking back through merges. That's probably why it's not the default. As for history rewriting, what this is really doing is making a *new set of commits* (not changing old ones): you essentially copy the original repository to a new "as if they were named that way all along" repository that you then switch to, abandoning the original repository. – torek Nov 30 '17 at 17:36

2 Answers2

3

If you are familiar with almost any other Version Control System (VCS), it can be very difficult to understand what Git does with file history.

The fact is that Git doesn't have file history. It may be unique among VCSes here (though I don't have experience with many of the more arcane VCSes). Its closest cousin, Mercurial, does have file history: each file added to Mercurial is assigned a unique number in what Mercurial calls the manifest, and this determines the file's identity. If you change the name of a file—or an entire directory full of files—they retain their identities, because this information exists in the manifest.

Git does away with this notion entirely. Git has no file history at all. Git has only commits.

Each commit stores a complete snapshot of a source tree. Each commit also has some number of parent commits, usually just one. This is much more like traditional commit-based VCSes: one can trace through the various commits, or look at file history. But since Git doesn't have file history, the only thing it has is commit history.

In order to implement git log --follow and other useful items, what Git offers, instead of file history, is rename detection. Git can look at any one specific commit, and compare that commit to its parent commit—or for merge commits, to all of its parents. When it does this comparison, it offers the option of detecting files that were renamed via that commit: files that had one name in the parent, but a different name in the child.1

Git even offers this rename detection when comparing two arbitrary commits, that are not just parent-and-child. Running:

git diff --find-renames $hash1 $hash2

compares the two commits, and wherever there is a candidate for "file with path a/b/c.txt in $hash1 sure looks a lot like file with path d/e/f.log in $hash2", Git may claim that the file was renamed (and then perhaps modified as well). It's important to remember, though, that Git is merely synthesizing a way to transform the first file into the second. The two actual files in the two commits are stored that way permanently. They can never be changed: as long as those commits exist, those two files are stored that way in those two commits. Those two files are not actually related at all unless you want them to be. Git is "finding" a rename by comparing them for similarity. Give Git a different set of "similarity" criteria—e.g., -M75% instead of -M50%—and Git may choose a different set of "sufficiently similar" files.

Nothing has happened to any of the commits. They are all frozen in time. But with a different set of "rename threshold" values, "break thresholds", and so on, Git may pair up different path names. Given --no-renames, Git will never pair up different path names (though it will still pair up files with the same name).

(This dynamic rename detection matters, somtimes a great deal, when merging, because merge runs two git diff --find-rename operations, from the merge base commit to each of the two branch tip commits that are being merged. If Git finds a rename, it believes it. If it does not find a rename, it believes that the base file was deleted in the tip, and a different file was created in the tip. You can control the rename threshold, but you cannot set break or copy threshold values, at least in Git versions up to today, 2.15.)


1The meaning of this is less clear for merge commits, since there is more than one parent: what does it mean for file child.txt to have had name p1.txt in parent #1 and p2.txt in parent #2? Traditional VCSes, with their unique internal numbering systems that determine file identity, assign a clear meaning here, but in practice, this meaning is not always useful, and Linus Torvalds' choice here, to do away with this notion entirely, may have been in part a reaction to that.

torek
  • 448,244
  • 59
  • 642
  • 775
1

You can set the --follow option as default for the git log command:

git config --global log.follow true
Alexandre Bodi
  • 344
  • 2
  • 12