See history of a restored file from before delete in Git

Question

I have a file foo.bar with a long and interesting history over years. It has been modified. It has been renamed. With the right git log parameters, I can still trace its modification history. This is important to me.

In revision 33333 I delete foo.bar because it seemed like we needed that feature. But four commits later, in revision 77777, it turns out we need that feature, so I wanted foo.bar back.

As most everyone knows I can issue:

git checkout 3333 -- path/to/foo.bar

But nobody seems to know definitively whether I'll still see the full history that I would see before when issuing git log, in revisions 88888 and beyond.

https://stackoverflow.com/a/42287548/421049 for example claims to answer this exact same question. But that answer is unclear as to whether the history before and after delete will be connected in the log, and questions in the comments for clarification have gone unanswered.

IMO you worry too much. You need the old file and its code so you just checkout it and the rest is history, pun intended. Why care at all if `git log` connects or not the history of the file? — phd, Feb 24 '19 at 23:50

torek · Answer 1 · 2019-02-25T11:15:25.633

The way to find out is to try it.

If you do try it, you'll find that git log -- file (with or without --follow) does indeed keep looking.

Note:

With the right git log parameters, I can still trace [a particular file's] modification history ...

It's important to remember here that Git doesn't store file history. Git stores commits, and the commits—and their graph—are the history.

When you run git log -- file or git log --follow -- file, Git synthesizes a file history, by showing you some selected subset of the actual history. The actual history is, of course, just every commit starting at some ending point and working backwards, following all parents of merges unless --first-parent is used to follow only the first parents of merges.

The synthetic, limited history produced by omitting some commits and showing other commits will, of course, depend on which commits are shown and which commits are omitted. Here, it becomes important to distingiush between two separate processes:

How does Git decide which commits to traverse?
Having traversed those commits, how does Git decide which commits to show?

The unlimited, default git log action is to traverse all reachable commits and to show all commits traversed, which makes the two questions synonymous. You just need to understand the idea of reachability, which is why I provided the link to Think Like (a) Git here. But as soon as you enable what the git log documentation calls History Simplification, the two become separate. The documentation uses what I think is poor phrasing here:

one part is selecting the commits and the other is how to do it

What they mean here is:

one part is selecting the commits to be shown (added words mine, emphasis mine): This is my second bullet point above. After, or during, the traversal of some subset of commits reachable from the starting point, we show some subset of those commits.
the other is how to do [the traversal]: "it" is a particularly useless pronoun here, referring to the general idea of history simplification. This amounts to saying the method used to do history simplification is the method used to do history simplification, which, well ... okay then!

In any case, the main idea—which applies to everything except the --ancestry-path option—is to start with a concept they call TREESAME here. To apply the idea, we must also look at each commit's parent or parents.

We already know, of course, that to do any kind of commit-graph walk, we have Git grab a commit out of the object database and examine it and gather up its parent hash IDs. Most commits have just one parent, so it's obvious where to move next. A few commits—there is at least one—are root commits, which is defined as a commit with no parents, and it's obvious what to do here as well: just stop traversing. The remaining commits, neither root nor ordinary, are merge commits, and by default, without history simplification, Git would look at all parents of each merge commit.

When history simplification is turned on, though, Git doesn't look at all parents. Instead, it uses this TREESAME idea. To decide if commits C (child) and P_i (the i'th parent) are TREESAME, Git compares the contents of all the files that we're interested in, in that commit—and now the --follow and/or -- path part matters, because this declares which files we find interesting right now. So Git pretends that commits C and P_i have only those files in them, and says: are the files identical, or different?

If the files are different, the two commits are not TREESAME. If the files have the same contents, the two commits are TREESAME.

Git will select commit C to be shown if it is not-TREESAME to any parent, i.e., if the file(s) in C, after any stripping-out, differ from those in any P_i. But if the commit is a merge commit—has more than one parent—the parent it will follow, by default, to find more commits to maybe-show maybe-follow, is one of the parents that is TREESAME to C.

There are a handful of options (five total, including the default) to change the way the commit-following works. One of them, --sparse, also changes the set of commits to be shown, otherwise this "not-TREESAME-to-any-parent" rule is the rule used for showing traversed commits.

Last, we can look at --follow itself. What this does is simple—probably too simple, since it fails with --full-history, for instance. As with any git log -- paths operation, it turns on history simplification, so that only commits that "touch" the given path get displayed. However, it also enables Git's rename detection, and simultaneously, demands that only one path be specified.

As Git traverses commits from child to parent, doing the simplification steps to determine (a) which parent(s) to follow and (b) which commits to display, Git also checks for a case where the diff, from parent-to-child, says that the file at path in the child is the result of a rename from a different path in the parent. If that's the case, then as soon as Git switches from examining commit C to examining parent commit P, it also switches which path name it is using for history simplification. Now instead of path/to/file.ext it might be looking for old/path/to/file.txt, for instance.

If you have commits in your commit graph in which there is no copy of some file under some path such as path/to/file.ext, git log still traverses those commits. It's just that those commits obviously don't change that file—they don't even have that file—and therefore each child commit is TREESAME to each of its parent commits. At some point during this traversal, if a commit appears in which path/to/file.ext is in a parent and not in the child, the traversal may—or may not, depending on your chosen history simplification—move to that parent. If the file is added or removed in an ordinary, single-parent commit, the behavior is easy to understand, as there's no special TREESAME weirdness to apply. If the file is added or removed in a commit C that is a merge and has multiple parents P₁…P_n, the TREESAME rules will, by default, take Git down a commit-graph path that doesn't have the file!

To really figure this all out, study the git log documentation. The answers are in it—once you can get past the not-so-good phrasing, anyway.

Side note: `git blame` is different

Both git log and git blame will walk the commit graph, but git blame (or its less-charged synonym, git annotate) has rather different algorithms in it for deciding which file(s) to look at in which commit(s). Its goal is to locate which recent commit changed any given line of any given file. When you hit a point at which a file does not exist in the parent commit but does exist in the child, every line is added and so the tree-walking stops at this point.

To continue before this point, it's necessary to find some earlier commit that does have the file—which is easy-ish to do using git log, history simplification, and the parent that doesn't have the commit as the starting point. You can then start the git blame operation at the point where the file existed, and work backwards from there.

_"and their graph—are the history"_ this sounds better, as the collection of graph and commits is plural. — evolutionxbox, Feb 25 '19 at 09:46
@evolutionxbox: The em-dash is almost an appositive, in which case the verb should agree (number-wise) with the word before the appositive phrase, but I can't find a proper authoritative citation here. I think I will go with plural though. — torek, Feb 25 '19 at 11:14

LeGEC · Answer 2 · 2019-02-24T23:28:08.897

0

Yes, the history is kept.

The question in the comment was about git blame foo.bar, not about git log foo.bar

edited Feb 24 '19 at 23:28

answered Feb 24 '19 at 23:20

LeGEC

46,477
5
57
104

See history of a restored file from before delete in Git

2 Answers2

Side note: git blame is different

Side note: `git blame` is different