4

Are the hashes for the tip of every branch sufficient to prove the integrity of my entire repository? For the sake of discussion, assume you had to give your whole repository to someone, let them do anything they want to it, and determine whether or not they changed even 1 bit of data. How would you do it?

If I'm pushing to an upstream, bare repository, is this all the data I need to guarantee I can verify the integrity of the whole repository at a later date?

git ls-remote --heads origin

fcce961b46784fae13be8a30c2622ddd34d970ec        refs/heads/develop
9da7bb692a72235451706f24790a3f7a100a64e2        refs/heads/feature-netty-testing
86020c50d86691caecff4a55d3b1f2f588f6291d        refs/heads/javafx-testing
871d715e5c072b1fbfacecc986f678214fa0b585        refs/heads/master
7ed641c96d910542edeced5fc470d63b8b4734f0        refs/heads/orphan-branch

That's from a sandbox repository I use to play around with. The orphan-branch is a branch I intentionally orphaned as described here. Everything seems right to me. All the branches I expect are listed, but I'm not positive if the SHA of every branch tip is all I need. Am I missing anything?

What about tags? What about branches that were deleted without being merged into anything?

Updated

As pointed out in some comments, there may be other refs besides heads that may need to be considered. For example, tags and notes may be useful depending on whether or not they are important to you or whether or not you are signing your tags. For myself I am mainly interested in the content of commits which is why I accepted VonC's answer.

Community
  • 1
  • 1
Ryan J
  • 2,502
  • 5
  • 31
  • 41

1 Answers1

5

That seems enough in term of integrity.
Tags reference commits, so if a commit changes, a git fsck will detect the incoherency between the tag and its non-existent commit.

Note that integrity is different from trust (ie vouching for the content)
For that, "A Git Horror Story: Repository Integrity With Signed Commits" is instructive.

First its section "Commit History" details the theory behind SHA-1 integrity (also presented in "Git and Data integrity", and concludes with:

That said, it is important to understand that the integrity of your repository guaranteed only if a hash collision cannot be created — that is, if an attacker were able to create the same SHA-1 hash with different data, then the child commit(s) would still be valid and the repository would have been successfully compromised.
Vulnerabilities have been known in SHA-1 since 2005 that allow hashes to be computed faster than brute force, although they are not cheap to exploit.
Given that, while your repository may be safe for now, there will come some point in the future where SHA-1 will be considered as crippled as MD5 is today.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Surely the tag can be deleted and a new one with the same name created pointing to a different, but valid, commit? Shouldn't the OP collect all the refs/ in the repo, not just the heads, or at least include the /tags? – Philip Oakley Aug 13 '12 at 22:03
  • @PhilipOakley sure, tags would help too. Signed tags would make any change harder though. – VonC Aug 14 '12 at 07:10
  • After playing around with Git notes last night I agree it may be better to err on the side of caution and assume all refs are needed. It's always better to have too much data than too little IMO. – Ryan J Aug 14 '12 at 11:42