5

How can I get SHA hash of a file in specified commit? I can get all commits that touched the file using git log file, but how can I get SHA hash of a file in each particular commit?

I think I can do it by checking out the commit and than use git-hash-object, but there must be easier way.

graywolf
  • 7,092
  • 7
  • 53
  • 77
  • Do you need the verbatim SHA1 of the file contents only, or do you want the SHA1 of the blob object used to store the file in Git's history? – knittl Apr 26 '16 at 18:43
  • Do I understand correctly that the blob object includes file mode and stuff? If that's the difference, I need the file contents only, like if I would run `sha1sum file`. – graywolf Apr 26 '16 at 18:44
  • The blob contains additional type and length information and some null bytes – knittl Apr 26 '16 at 20:14

3 Answers3

7

There is a very quick way to get the Git hash for a file within some commit:

git rev-parse <commit-ID>:/path/to/file

Git's hash is a SHA-1 of the word blob followed by a space, followed by a decimal ASCII string giving the size of the file in bytes, followed by a NUL byte, followed by the file's contents:

size=$(wc -c $file)
(printf "blob %d\0" $size; cat $file) | sha1sum -

It looks from comments, though, like you want an actual SHA-1 of the file's contents (as someone else would get by extracting the file and running sha1sum on it), and not the git hash:

git show <commit-ID>:path | sha1sum -

is the general (non-bash-specific) method (bash's <( is fine as well, just make sure you have the fdesc file system mounted).

torek
  • 448,244
  • 59
  • 642
  • 775
  • The commit ID can also be a branch name, or a hash, or any of the other ways of naming as given in `git help revisions`. You can diff the files using the same naming method (ideal for looking at deviation between forks) – Philip Oakley May 29 '21 at 12:15
6

git show and git log are close cousins and share options. Your question asked about the SHA-1 object name of a file associated with a particular commit but then also for the same information for each commit along the way in the history.

The --raw option gives the information you’re after. The examples below will use git’s own repository.

To show the files that changed with a particular commit, use git show or git log -1. The latter will not generate output for the tag object but the tagged commit only.

$ git log -1 --raw v2.8.1
commit d95553a6b8c5153f541adcfc3346004e8249b0e6
Author: Junio C Hamano <gitster@pobox.com>
Date:   Sun Apr 3 10:11:35 2016 -0700

    Git 2.8.1

    Signed-off-by: Junio C Hamano <gitster@pobox.com>

:000000 100644 0000000... ef6d80b... A  Documentation/RelNotes/2.8.1.txt
:100644 100644 adc940b... 8afe349... M  Documentation/git.txt
:100755 100755 4e9450b... 46595da... M  GIT-VERSION-GEN
:120000 120000 7db3040... d40c3e1... M  RelNotes

Each change line contains

  • beginning or source mode (000000 indicates created or unmerged)
  • resulting or destination mode (000000 indicates deleted or unmerged)
  • source SHA-1 (all zeros for creation)
  • destination SHA-1 (all zeros for deletion)
  • status code plus optional numeric score (above A is addition and M modification)
  • path

See “Raw output format” in git diff’s documentation for the full details.

The SHA-1 object name for the file RelNotes associated with the v2.8.1 tag is d40c3e1, which we can verify and expand to all forty digits with

$ git rev-parse v2.8.1:RelNotes
d40c3e126c03b0e4bd9c6162f63a35a45f5e9020

To show hashes for RelNotes, which is a symbolic link that points to the under Documentation/RelNotes that corresponds to a given version, along the way in version 2.8.1’s history:

$ git log --raw v2.8.1 -- RelNotes
commit d95553a6b8c5153f541adcfc3346004e8249b0e6
Author: Junio C Hamano <gitster@pobox.com>
Date:   Sun Apr 3 10:11:35 2016 -0700

    Git 2.8.1

    Signed-off-by: Junio C Hamano <gitster@pobox.com>

:120000 120000 7db3040... d40c3e1... M  RelNotes

commit c9906e47c065940bfe1a9992da494a8f437a49ac
Author: Junio C Hamano <gitster@pobox.com>
Date:   Tue Jan 12 15:20:51 2016 -0800

    First batch for post 2.7 cycle

    Signed-off-by: Junio C Hamano <gitster@pobox.com>

:120000 120000 3ba13ce... 7db3040... M  RelNotes

commit 24a00ef646974be49ef7138239c3803805400797
Author: Junio C Hamano <gitster@pobox.com>
Date:   Mon Oct 5 12:58:10 2015 -0700

    Start cycle toward 2.7

    Signed-off-by: Junio C Hamano <gitster@pobox.com>

:120000 120000 def6ebd... 3ba13ce... M  RelNotes
[...]

Use the --abbrev option to get all forty hex digits of the hash. Here the output will appear extra chatty because the output of git show covers both the v2.8.1 tag and the commit to which v2.8.1 points.

$ git show --raw --abbrev=40 v2.8.1
tag v2.8.1
Tagger: Junio C Hamano <gitster@pobox.com>
Date:   Sun Apr 3 10:14:32 2016 -0700

Git 2.8.1
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAABAgAGBQJXAU94AAoJELC16IaWr+bLopQQAONTo52BGPCr7exw757SKY90
gYsHDxTaNpPtGZS7ltdOiEESPG3Mx3w1OYk7CBPtxjBLM+JvEdcZsCKrs/RlTrKL
lTc53WHC1tUa8EYjEyHNq4z0E2y4tCTNsj5eD2n/lAdTn2SK59bL4DEouDP2mYJU
3pUkujD9tu/ATw1s77VNiHxcrg9V9TdltaP2+lkHPzXXx8fb8kkabFRkzqvQdgfe
Qe0mZEHKRZY4nEO16dKukalxyWW0iMfoSVeRTjJiQU4HEcMyEnG3lfKeI1ddKVTQ
+XfAM6QianXqdfHRt5ol9MwCm9HAcGWu82caIBOTsc3L7bDrbJTTkDOvwpmVUDJi
WcqgocDGr/x7RA0/E8bqoIv40UXx07DzBTv3mKBo2CMvkow6pgQjsKKfPrvoNKyC
qFqp07A3UXgLWeWLF2iaYJklkq2jEeLPKOCJ1lJcPUg+Kk20+FQEo1XPERnrosoz
xHDDMBy7Vnvd0ij8Ipaxj2XHfIVYHC/WcrfsjiRYa1sHMjdTw/6I0tdtdUkDiY2W
70AsYQUWPtU52tSuK7divMoym3g583bNtu5X+6STDtLZc5XbVAtMEg5PYadTuwci
tTmXTUrti2qLsDp2XZI7rKbKVo5JyW8BYC8BeLUwgVnkj9svG5+6rlTKtgXa+hCo
L9gDU1Iie03IlIHnL+/s
=NLvn
-----END PGP SIGNATURE-----

commit d95553a6b8c5153f541adcfc3346004e8249b0e6
Author: Junio C Hamano <gitster@pobox.com>
Date:   Sun Apr 3 10:11:35 2016 -0700

    Git 2.8.1

    Signed-off-by: Junio C Hamano <gitster@pobox.com>

:000000 100644 0000000000000000000000000000000000000000 ef6d80b008a0a7970238404b034593be27e933c3 A      Documentation/RelNotes/2.8.1.txt
:100644 100644 adc940bf7591069c74c9b47aa5e5686e0438d606 8afe349781d57527083fdb75511959fd25a4239b M      Documentation/git.txt
:100755 100755 4e9450b3ae0c403820f0166435c52c4ea74e7451 46595dad2234f861198347ef8f4f60d167061709 M      GIT-VERSION-GEN
:120000 120000 7db30403c3471e15f4f15a5e68016d7926b3e3de d40c3e126c03b0e4bd9c6162f63a35a45f5e9020 M      RelNotes

The SHA-1 object name of a blob (how git represents file contents) is not identical to running sha1sum on the file because git adds metadata to the front: the literal string blob, followed by a space, followed by the length of the content in decimal, and terminated with a NUL byte. To compute a SHA-1 hash of the contents of successive versions of a file going backward in time, use a command along the lines of

$ for commit in $(git log --pretty=%H v2.8.1 -- RelNotes | head -3) ; \
    do git show ${commit}:RelNotes | sha1sum ; \
  done
ce5501f9daadf110a20a4e4eccdfed63ef4b27e3  -
bd4d920214c4a48d8820292e24f020690595858d  -
5d47b511d86abd490fa4f2c2a8d4ef3589e1aecf  -

With --pretty=%H and -- RelNotes we tell git that we want only the SHA-1 hashes of commits that touch RelNotes (limited to the three most recent with head -3). Then for each of those commits, we feed the tracked content to sha1sum.

If you prefer xargs, it looks like

$ git log --pretty=%H v2.8.1 -- RelNotes | head -3 |
    xargs -I {} sh -c 'git show {}:RelNotes | sha1sum'
ce5501f9daadf110a20a4e4eccdfed63ef4b27e3  -
bd4d920214c4a48d8820292e24f020690595858d  -
5d47b511d86abd490fa4f2c2a8d4ef3589e1aecf  -
Greg Bacon
  • 134,834
  • 32
  • 188
  • 245
1

At a minimum, you don't need to check out the commit. git show can directly show you an object, including a blob. You can send that into git hash-object without ever checking it out.

I think there should be a more efficient way, but you can do

git hash-object <(git show [commit]:[path])

So, for example,

$ git hash-object <(git show master:Makefile)
3fb4e1cbe0019c691a504e3419ece252db6f60ab
zebediah49
  • 7,467
  • 1
  • 33
  • 50
  • The tree of the commit already has a pointer to the blob object (the file). Depending on the OPs requirements, this might suffice. – knittl Apr 26 '16 at 18:42
  • That's awesome, thanks a lot. And it's even reasonably fast, under 3 seconds for whole history of more changed files in the project. This will help me a lot with some scripting :) Thanks again – graywolf Apr 26 '16 at 18:52
  • @zebediah49: one question though, why does it show different string than `sha1sum file`? My working tree doesn't have any changes, so I would expect the first hash to be the same as output of `sha1sum file`.. I'm using this: `for commit in $(git log --oneline path/to/file | cut -d\ -f1); do git hash-object <(git show $commit:path/to/file); done` – graywolf Apr 26 '16 at 18:55
  • I can replicate that behaviour without using the `git-show` part: `git hash-object ` and `sha1sum ` produce different results. Presumably, they are doing different things -- for example, the git command says that it "computes the object ID", which apparently is not just the SHA1 sum of its content. E: http://stackoverflow.com/questions/5290444/why-does-git-hash-object-return-a-different-hash-than-openssl-sha1 – zebediah49 Apr 26 '16 at 19:01
  • well I guess I could just use `sha1sum - <(git show $commit:path/to/file)`? – graywolf Apr 26 '16 at 19:04
  • You've combined the two ways of piping input into the command -- `sha1sum -` is asking for input on stdin, so you can use `git show [thing] | sha1sum -`, or you can use the anonymous pipe to pretend to give it a file `sha1sum <(git show [thing])`. In this case, `sha1sum` sees that you told it to run on a file like `/dev/fd/XX`, which is a pipe created by the shell, with the other end attached to the `git show` command. Using both is telling `sha1sum` to first hash standard in, and then to hash this named pipe. – zebediah49 Apr 26 '16 at 19:10