Blob objects store only the file's data, not the path name (nor the mode).
What this means is that if we make one commit, or many commits, containing the same data, we get the same blob hash ID:
$ echo test data > file
$ git add file
$ git commit -m "add some test data"
[commit message here]
$ git rm file
$ git commit -m "remove the test data"
[commit message here]
$ echo test data > different-name
$ git add different-name
$ git commit -m "add the same data under another name"
[commit message here]
If we inspect these commits, we will find that both files, file
and different-name
, have the same blob hash ID, even though they have different file names and do not coexist in adjacent commits. In fact, the blob hash ID of test data\n
is:
$ echo test data | git hash-object -t blob --stdin
082b3465b6ac4b857f930b655c1cdb398aa6c465
This is the hash ID of any blob holding exactly that string. The hash ID of a blob holding hello world\n
is equally predictable:
echo hello world | git hash-object -t blob --stdin
3b18e512dba79e4c8300dd08aeb37f8e728b8dad
What all of this means is that the file contents alone, not the file's name, determine the hash ID; if the contents themselves are not unique to that one path-name, there are multiple file names for that blob. This is how Git de-duplicates file content across commits (or even within commits).
As matt noted in a comment, the names are stored in tree objects. Technically, a tree object stores a (sorted) list of 3-tuples: mode, name-component, hash-ID. The git add
command prepares a file for committing by using the equivalent of git hash-object -w
on the file's contents, to store the blob object into the repository database or find any existing blob object with that hash ID, and then writing the corresponding hash ID into Git's index. Git does not—yet—create any tree object for this.
Later, if and when you run git commit
, the commit code uses the equivalent of git write-tree
to turn Git's index contents into one or more tree objects, re-using or creating new tree objects as needed. The index contains the file's path name, including (forward) slashes, such as path/to/file.ext
; git write-tree
reads this and figures out that, in order to store the file, we'll need at least three internal tree objects:
- One tree object will contain
path
, with mode 040000
(though leading zeros are actually suppressed in the internal format), and a hash ID. That will be the hash ID of the next tree object:
- One tree object will contain
to
, with mode 040000
(again with leading zeros suppressed), and another hash ID:
- The last—or first, in some sense—tree object will contain
file.ext
, with mode 100644
or mode 100755
as seen in Git's index, and the hash ID as seen in Git's index.
By using these three tree objects, Git will later be able to re-create, in a new index file, the path/to/file.ext
string with the mode 100644
or mode 100755
part and the correct blob hash ID. From there, Git will create or update the file path/to/file.ext
, perhaps by creating a folder path
, then a folder path\to
, and finally a file file.ext
in the to
folder in the path
folder.
So, as noted in comments, if the contents are unique, you'll be able to find this dangling blob (using git fsck
as you did), but Git never got around to storing the file's name anywhere except its own index, which it has since overwritten. While it seems to be partly broken in current Git releases, git fsck --lost-found
followed by "grep"-ing for contents in the resurrected dangling blobs is usually the way to go here.