31

Apparently, when you move a function from one source code file to another, the git revision log (for the new file) can show you where that code fragment was originally coming from (see for example the Viewing History section in this tutorial).

How does this work?

Thilo
  • 257,207
  • 101
  • 511
  • 656

2 Answers2

46

It doesn't track them. That's the beauty of it.

Git only records snapshots of the entire project tree: here's what all files looked like before the commit and here's how they look like after. How we got from here to there, Git doesn't care.

This allows intelligent tools to be written after a commit has already happened, to extract information from that commit. For example, rename detection in Git is done by comparing all deleted files against all new files and comparing pairwise similarity metrics. If the similarity metric is greater than x, they are considered renamed, if it is between y and x (y < x), it is considered to be a rename+edit, and if it is below y, they are considered independent. The cool thing is that you, as a "commit archaeologist", can specify after the fact, what x and y should be. This would not work if the commit simply recorded "this file is a rename of that file".

Detecting moved content works similar: you slice every file into pieces, compute similarity metrics between all the slices and can then deduce that this slice which was deleted over here and this very similar slice which was added over there are actually the same slice that was moved from here to there.

However, as tonfa mentioned in his answer, this is very expensive, so it is not normally done. But it could be done, and that's the point.

BTW: this is pretty much the exact opposite of the Operational Transformation model used by Google Wave, EtherPad, Gobby, SubEthaEdit, ACE and Co.

Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
  • 35
    I don't quite get *"That's the beauty of it"*. I mean, your explanation sounds like *"Git doesn't store what actually happens to the files so that you can guess it later yourself!"* Where's the beauty? – Kos Jul 26 '11 at 08:19
  • 3
    In my opinion the beauty of it is the realization that the tracking should not be part of the core version control itself. "Outsourcing" this feature avoids many of the complications and shortcomings of e.g. SVN. You gain simplicity and flexibility (SVN tools are typically restricted to using the tracking info that was originally recorded, even though this might not be a good representation of what actually happened to the codebase). – nikow May 08 '12 at 07:52
  • Kos, you do not have to guess it later. Git has heuristics to detect it when it matters (ie, when merging). – Frank Schwieterman Dec 21 '13 at 22:42
  • 2
    The beauty is that git doesn't presume that its current heuristics are the correct ones - it just saves the data and lets you interpret that data later, using whatever heuristics you want. This also means that if some heuristic is really expensive to compute, it can be done somewhere other than on the (potentially very busy!) shared repo. – pjz Jun 12 '14 at 16:11
  • FWIW, Linus explained the justification for this design choice in [this email](http://article.gmane.org/gmane.comp.version-control.git/217), in classic Linus fashion. – tavnab Dec 26 '15 at 12:36
  • 1
    @tavnab The Gmane-link has gone down since, but the mail [is still available through the Web Archive](http://web.archive.org/web/20090423040902/http://article.gmane.org/gmane.comp.version-control.git/217). – kdb Mar 14 '19 at 15:09
  • I think the problem is I cannot use git to *store what **actually** happens* in metadata, and can only store the information in commit message. – apple apple Jan 02 '21 at 15:01
  • IMHO the heuristics has nothing to do with save known truth. Even if it save the rename, heuristics can always be done (and maybe optionally ignore the said truth). But it can never tell the truth only using heuristics. – apple apple Jun 30 '21 at 14:37
3

It's purely a heuristic. It compares the distance between files and tries to find matching blocks. But this heuristic is only implemented when the code is copied or moved to a new file (otherwise it would be too costly, checking every pair of files).

tonfa
  • 24,151
  • 2
  • 35
  • 41