Linus was referring to the fact that git commits are identifiable by their hash.
Git trees are objects consisting of multiple (trees, blobs) (read: blob=file, roughly).
The cryptographic hash of a parent node in is a hash of that of all underlying trees/blobs recursively. Such trees are known as Merkle (Hash) Trees
and have the interesting property that the toplevel hash is a cryptographically strong hash that uniquely identifies the whole tree.
Note that the hash includes the commit attributes, and these include the parent ids. That is, if some file in some revision ever changes, the hash of the blob changes, therefore the hash(es) of the containing trees change, the hash of the snapshot (root tree) changes, the hash of the commit changes, therewith the hash of any child commits need to change and so on. All history will be altered.
If any of these rules are violated, it will be trivially detectable:
- the hash of a single tree is deterministically verifiable in O(n) where n is the number of objects in the root tree
- the integrity of a full branch history is deterministically verified in O(n) where n is the number of nodes in a revision chain.
In fact, git-verify-tag
, git fsck
are useful commands to do the checking explicitly. Besides that, verification automatically occurs on git subcommands (send-pack, receive-pack, read-tree, write-tree etc.)
Re: Replace the offending commit thread
In this first post by Linus he already deconstructs/defuses the bomb:
Hmm. Scary. That should not have been successful with a corrupt repo.
Unless you have done a .grafts file to hide the corruption, or something
like that?
Which is immediately confirmed by Denis Bueno in the response.