First, let me quote from the gitattributes documentation:
ident
When the attribute ident
is
set for a path, Git replaces $Id$
in the blob object with
$Id:
, followed by the 40-character hexadecimal blob object
name, followed by a dollar sign $
upon checkout. Any
byte sequence that begins with $Id:
and ends with $
in the worktree
file is replaced with $Id$
upon check-in.
So the IDs in the checked-out files are blob hash IDs, not commit hash IDs. The blob hash ID is specifically (currently) an SHA-1 checksum of the contents of the data file preceded by the literal text blob
, a space, an ASCII-fied representation of the size of the data in bytes, and a NUL byte '\0'
.
(That's the data with $Id$
in it, not the data with the hash ID inserted, of course. So if the source file consists of $Id$\nhello\n
, with \n
representing newlines, we want to compute the SHA-1 of the output of:
printf 'blob 11\0$Id$\nhello\n'
since $Id$\nhello\n
is 11 bytes long. This blob's hash ID is therefore 173cbef4e466bed5350cae075633cb81d1e01743
.)
These are not guaranteed to be invertible, because it's possible that the identity information you can get from the binary may be insufficient to identify one particular commit. For a classic example, consider a program built from a single main.c
with:
#ident "$Id$"
but where the Makefile itself has -D
options that select something, and main.c
has #ifdef FEATURE1
and so on.
Build #1 is made with a Makefile that says -DFEATURE1
. Build #2 is made with a Makefile that does not have this -D
. These two different builds are from different commits, but they have the same blob hash ID for file main.c
, and therefore the two different binaries produced by linking the compiled main.o
ident lines with libc have the same hash.
The closest you can get is to:
- collect all the IDs you can get;
- examine each potential build's source tree to identify the blob hash IDs of the corresponding inputs; and
- list out all matching commits.
If you're lucky, there's just one matching commit.
The remaining issue is how to do the above. Presumably you will use whatever program you already use to extract the ident info from the binary, for the first bullet point. For the second and third, you must write a script.
The script itself is pretty short: you just need to look through each potential build and extract the corresponding blob hashes. So, find a commit that could be a build, then use git ls-tree -r $commithash
to obtain the output from git ls-tree -r
on that commit. (Run git ls-tree -r
once, on one commit, to see the output; note the blob hash IDs for each mode 100644
or mode 100755
file.)
Now, match up the known object file "ident"s against the corresponding source file blob hash IDs. How to do this mapping is up to you and depends on your tools and languages used. If all known ident values match all the right sources, $commithash
is a candidate hash, so print it.
Repeat for all candidate commits and you will get the best answers you can here.
(And, as you can see now, the ident
filter is not really very useful: it's much better to use git describe
to get a usable identity and stick it into the build output, during the build process.)