Can I have my cake and eat it too?
The short answer is no.
Long
Technically, the commit ID of each file in the HEAD
commit is the hash ID you get with git rev-parse HEAD
(or the longer but equivalent git rev-list
command you're using). That's because each commit contains a full snapshot of every file that Git knows about.
What you are getting when you use git rev-list
or git log
or, at a per-line-in-one-file, git blame
command to look backwards in history is not the commit hash ID of the file in question, because that's trivial. Instead, it's the commit hash ID of some earlier commit that contains the same file or, for git blame
, same line.
That is, suppose we have, in our Git repository, a simple linear history with just five commits in it. We can draw these five commits like this:
A <-B <-C <-D <-E <--master
where each uppercase letter stands in for an actual commit hash ID. The branch name, in this case master
, serves to let us find the actual hash ID of commit E
, since it looks random, and is difficult or sometimes impossible to find otherwise.
Commit E
, of course, contains a full snapshot of every file, as of the form it had when we—or whoever—made commit E
. It also contains the hash ID of earlier commit D
. Git calls D
the parent of commit E
.
But commit D
also has a full snapshot of every file as of the form it had when someone made D
, and a link back to its parent C
. This repeats for C
and so on, back throughout history (which ends when we hit A
, which has no parent commit).
What we'd like, in this case, is to have Git compare the snapshot of some file—README.md
, main.py
, or whatever—that appears in commit E
with the one that appears in its parent commit D
. If these two snapshots are the same, we'd like to have Git compare D
's with C
's. If those are the same, Git should keep working backwards. It should do this until it either runs out of commits at A
, or the comparison shows that the two files are different.1
In other words, we're repeatedly executing a simple comparison operation:
- Is file F the same or different in commits X and Y?
for each parent/child pair of commits. As soon as the answer is "yes, it's different", we have Git stop going backwards through history and print the hash ID of the commit it's reached at this point. (The internal storage format, which de-duplicates files across commits, makes this really easy. With git blame
, the computation is considerably harder and fancier, but it amounts to the same thing, just on a line-by-line basis.)
In order to do this, though, Git must have access to each of the commits that it needs to traverse as it walks backwards through history. History, in Git, is the set of commits in the repository. Git must have the history to use the history.
1A simple and expedient trick, which Git actually does use, is that when we hit the parent-less (orphan?) commit A
, it can simply pretend that there is a totally empty commit before A
. Then every file in A
is new, and therefore different from its virtual/fake parent. This is why every Git repository includes the empty tree.