1

I want to get SHA256 of all versions of a file in a git repos. (In this case there is binary blobs in a repo, and want to find from which commit in a different repo they came from)

We can use git show $hash:$file | sha256sum to get this for one commit.

Bash example that I want to avoid:

for h in $(git log --pretty=format:"%H" -- $file)
do
  git show $h:$file | sha256sum
done

Is there any way to do this for all commits with "git only"? (avoid running from bash)

Preferably inside git log formatting to get more data.

Clarification:

Running bash "inside git" is fine, but I would like to avoid to have to run the command from bash so that is usable in both bash and powershell.

I can also not think that my hack is the best way to get checksums of each version of file in a git repo.

There has been recommendations of looking at this recent question it uses internal sha1 and has answers with bash.

NiKiZe
  • 1,256
  • 10
  • 26
  • Does it have to be SHA256, or would SHA1 be ok? Because git itself stores everything based on a sha1 hash, so you will probably be able to display that without actually fetching the file contents. – IMSoP Aug 30 '21 at 09:30
  • 2
    This seems impressively similar [to this recently asked question](https://stackoverflow.com/questions/68974632/how-to-list-all-sha1-hash-a-specific-file-already-had). – Joachim Sauer Aug 30 '21 at 09:47
  • @IMSoP I would prefer sha256 to keep it the same as some other checksum files that already exists. – NiKiZe Aug 30 '21 at 10:02
  • @Joachim indeed very similar, but specific to SHA1/internal hash. – NiKiZe Aug 30 '21 at 10:02
  • @NiKiZe: I'm aware, which is why I didn't vote to close as duplicate. But you could take the result of that question and feed it into further hash calculation: it'll already give you the details you need to grab the actual file content for each change. – Joachim Sauer Aug 30 '21 at 10:58
  • @Joachim Don't seem to be able to find how it can help at all, I already have relevant commit ids and looping over them, grabbing contents and checksum, What I wanted to avoid was the outer bash - I would be perfectly fine with having `git log` execute bash. – NiKiZe Aug 30 '21 at 11:30
  • 1
    What exactly are you trying to avoid here? I mean, "I don't want to type the letters b, a, s and h in sequence" isn't going to get a lot of sympathy and you're asking people to put in a lot of effort to avoid a problem you haven't specified in any more detail than pretty much exactly that. – jthill Aug 31 '21 at 00:21
  • @jthill I want to run this in a portable way (work on both win and nix) I'm fine with git starting its shell, but I tried to avoid starting bash from powershell and then run the command. – NiKiZe Aug 31 '21 at 06:24
  • You want to script applying sha256sum to a selection of files from a vcs history, you have to pick a scripting language and you have to pick your vcs. The tools are portable: shells, vcs's, la la, but at some point you have to choose specific tools and use them. – jthill Aug 31 '21 at 17:06

1 Answers1

0

Git does not use SHA256 (not yet, at least not by default), and the hash IDs that Git does use for blob objects are those for the blob objects, not for the straight file contents. (This is why the answer to How does the newly found SHA-1 collision affect Git? is "it doesn't".)

Hence, if you want to compute SHA256 values for specific files, you must extract the files contents and run a SHA256 computation on them. Your bash example is more or less a minimal way to achieve what you want; you can't get much shorter without writing your own program.

torek
  • 448,244
  • 59
  • 642
  • 775
  • I fully expect `sha256` to be needed externally, I was mostly hoping to avoid having to do external looping. – NiKiZe Aug 30 '21 at 11:21
  • If your sha256 program takes multiple input arguments and you can extract each committed file to a separate in-file-system file, you can invert the loop: `sha256 $(scan-for-files)` for instance. But either way there's still a loop. – torek Aug 30 '21 at 11:32