I find the internal plumbing of git really fascinating. At least on a conceptual level it is simple and very elegant. Many sources on the topic have detailed descriptions of the blob object (Git Magic and Git Internals), and even short ruby scripts describing how to write blob objects with a few lines of ruby like in Pro Git:
require 'zlib'
require 'fileutils'
require 'digest/sha1'
content = "StackOverflow"
header = "blob #{content.length}\0"
data = header + content
sha1 = Digest::SHA1.hexdigest(data)
zlib_content = Zlib::Deflate.deflate(data)
path = '.git/objects/' + sha1[0,2] + '/' + sha1[2,38]
FileUtils.mkdir_p(File.dirname(path))
File.open(path, 'w') { |f| f.write zlib_content }
The usually conclude that the other storage objects (trees, commits and tags) are exactly the same, but with a different header. There seems to be some difference in the internal formats, though, since modifying the script header and text content only leads corrupted tree or commit entries, and/or non-matching checksums. Are the other objects stored in a different manner, compared to blobs, and in which way?
The pretty printed output from cat-file and the other objects, don't seem to bear very much resemblance with the actual storage file implementation.
Accoring to Git Magic, the tree object format is
"tree" SP "<content length>" NUL "<permissions> <filename>" NUL <checksum>
but I'm unable to generate the correct checksum for this, with my measly ruby skills. Is it possible to generate tree and commit objects as easily as the blob objects. Could someone provide short code snippets for this?