Checksum calculation for Git Tree and Commit objects

Question

I find the internal plumbing of git really fascinating. At least on a conceptual level it is simple and very elegant. Many sources on the topic have detailed descriptions of the blob object (Git Magic and Git Internals), and even short ruby scripts describing how to write blob objects with a few lines of ruby like in Pro Git:

require 'zlib'
require 'fileutils'
require 'digest/sha1'

content = "StackOverflow"

header = "blob #{content.length}\0"
data = header + content

sha1 = Digest::SHA1.hexdigest(data)

zlib_content = Zlib::Deflate.deflate(data)
path = '.git/objects/' + sha1[0,2] + '/' + sha1[2,38]
FileUtils.mkdir_p(File.dirname(path))
File.open(path, 'w') { |f| f.write zlib_content }

The usually conclude that the other storage objects (trees, commits and tags) are exactly the same, but with a different header. There seems to be some difference in the internal formats, though, since modifying the script header and text content only leads corrupted tree or commit entries, and/or non-matching checksums. Are the other objects stored in a different manner, compared to blobs, and in which way?

The pretty printed output from cat-file and the other objects, don't seem to bear very much resemblance with the actual storage file implementation.

Accoring to Git Magic, the tree object format is

"tree" SP "<content length>" NUL "<permissions> <filename>" NUL <checksum>

but I'm unable to generate the correct checksum for this, with my measly ruby skills. Is it possible to generate tree and commit objects as easily as the blob objects. Could someone provide short code snippets for this?

Here's a similar question, but it's again mostly about the blob objects: http://stackoverflow.com/questions/5290444/why-does-git-hash-object-return-a-different-hash-than-openssl-sha1 — Kai Inkinen, Jun 10 '11 at 11:23

score 4 · Accepted Answer · answered Jun 10 '11 at 08:13

4

Did you gave a look at grit that was developed to power github?

Someone has probably already implemented this in ruby, and most probably there.

Hope that helps.

answered Jun 10 '11 at 08:13

Vincent Guerci

14,379
4
50
56

3

http://programmers.stackexchange.com/questions/62843/best-ruby-git-library, https://github.com/libgit2/rugged, http://git.rubyforge.org/, http://www.rubyinside.com/git-and-ruby-git-tutorials-articles-and-links-for-rubyists-860.html – sehe Jun 10 '11 at 09:25
I found grit when trying to figure this out. The thing is, I'm not looking for a ruby git client, but rather would like to learn how git works internally. Ruby seems to be a fairly good language to communicate this sort of information, even if my ruby skills are next to non-existing. – Kai Inkinen Jun 10 '11 at 09:59
1

@sehe, thanks for the interesting links! @Kai I understand, sorry if that didn't helped much. Without just using a library, but looking at their code (also look at sehe recommendations) would certainly answer all of your questions + learn ruby a bit more... That's the way I would probably do it... For more complex stuff not so documented, I would also give a look at git code directly > `git clone git://git.kernel.org/pub/scm/git/git.git` – Vincent Guerci Jun 10 '11 at 10:07

Checksum calculation for Git Tree and Commit objects

1 Answers1