The way to find out is to try it.
If you do try it, you'll find that git log -- file
(with or without --follow
) does indeed keep looking.
Note:
With the right git log
parameters, I can still trace [a particular file's] modification history ...
It's important to remember here that Git doesn't store file history. Git stores commits, and the commits—and their graph—are the history.
When you run git log -- file
or git log --follow -- file
, Git synthesizes a file history, by showing you some selected subset of the actual history. The actual history is, of course, just every commit starting at some ending point and working backwards, following all parents of merges unless --first-parent
is used to follow only the first parents of merges.
The synthetic, limited history produced by omitting some commits and showing other commits will, of course, depend on which commits are shown and which commits are omitted. Here, it becomes important to distingiush between two separate processes:
- How does Git decide which commits to traverse?
- Having traversed those commits, how does Git decide which commits to show?
The unlimited, default git log
action is to traverse all reachable commits and to show all commits traversed, which makes the two questions synonymous. You just need to understand the idea of reachability, which is why I provided the link to Think Like (a) Git here. But as soon as you enable what the git log
documentation calls History Simplification, the two become separate. The documentation uses what I think is poor phrasing here:
one part is selecting the commits and the other is how to do it
What they mean here is:
one part is selecting the commits to be shown (added words mine, emphasis mine): This is my second bullet point above. After, or during, the traversal of some subset of commits reachable from the starting point, we show some subset of those commits.
the other is how to do [the traversal]: "it" is a particularly useless pronoun here, referring to the general idea of history simplification. This amounts to saying the method used to do history simplification is the method used to do history simplification, which, well ... okay then!
In any case, the main idea—which applies to everything except the --ancestry-path
option—is to start with a concept they call TREESAME here. To apply the idea, we must also look at each commit's parent or parents.
We already know, of course, that to do any kind of commit-graph walk, we have Git grab a commit out of the object database and examine it and gather up its parent hash IDs. Most commits have just one parent, so it's obvious where to move next. A few commits—there is at least one—are root commits, which is defined as a commit with no parents, and it's obvious what to do here as well: just stop traversing. The remaining commits, neither root nor ordinary, are merge commits, and by default, without history simplification, Git would look at all parents of each merge commit.
When history simplification is turned on, though, Git doesn't look at all parents. Instead, it uses this TREESAME idea. To decide if commits C (child) and Pi (the i'th parent) are TREESAME, Git compares the contents of all the files that we're interested in, in that commit—and now the --follow
and/or -- path
part matters, because this declares which files we find interesting right now. So Git pretends that commits C and Pi have only those files in them, and says: are the files identical, or different?
If the files are different, the two commits are not TREESAME. If the files have the same contents, the two commits are TREESAME.
Git will select commit C to be shown if it is not-TREESAME to any parent, i.e., if the file(s) in C, after any stripping-out, differ from those in any Pi. But if the commit is a merge commit—has more than one parent—the parent it will follow, by default, to find more commits to maybe-show maybe-follow, is one of the parents that is TREESAME to C.
There are a handful of options (five total, including the default) to change the way the commit-following works. One of them, --sparse
, also changes the set of commits to be shown, otherwise this "not-TREESAME-to-any-parent" rule is the rule used for showing traversed commits.
Last, we can look at --follow
itself. What this does is simple—probably too simple, since it fails with --full-history
, for instance. As with any git log -- paths
operation, it turns on history simplification, so that only commits that "touch" the given path get displayed. However, it also enables Git's rename detection, and simultaneously, demands that only one path be specified.
As Git traverses commits from child to parent, doing the simplification steps to determine (a) which parent(s) to follow and (b) which commits to display, Git also checks for a case where the diff, from parent-to-child, says that the file at path in the child is the result of a rename from a different path in the parent. If that's the case, then as soon as Git switches from examining commit C to examining parent commit P, it also switches which path name it is using for history simplification. Now instead of path/to/file.ext
it might be looking for old/path/to/file.txt
, for instance.
If you have commits in your commit graph in which there is no copy of some file under some path such as path/to/file.ext
, git log
still traverses those commits. It's just that those commits obviously don't change that file—they don't even have that file—and therefore each child commit is TREESAME to each of its parent commits. At some point during this traversal, if a commit appears in which path/to/file.ext
is in a parent and not in the child, the traversal may—or may not, depending on your chosen history simplification—move to that parent. If the file is added or removed in an ordinary, single-parent commit, the behavior is easy to understand, as there's no special TREESAME weirdness to apply. If the file is added or removed in a commit C that is a merge and has multiple parents P1…Pn, the TREESAME rules will, by default, take Git down a commit-graph path that doesn't have the file!
To really figure this all out, study the git log documentation. The answers are in it—once you can get past the not-so-good phrasing, anyway.
Side note: git blame
is different
Both git log
and git blame
will walk the commit graph, but git blame
(or its less-charged synonym, git annotate
) has rather different algorithms in it for deciding which file(s) to look at in which commit(s). Its goal is to locate which recent commit changed any given line of any given file. When you hit a point at which a file does not exist in the parent commit but does exist in the child, every line is added and so the tree-walking stops at this point.
To continue before this point, it's necessary to find some earlier commit that does have the file—which is easy-ish to do using git log
, history simplification, and the parent that doesn't have the commit as the starting point. You can then start the git blame
operation at the point where the file existed, and work backwards from there.