I'd like to improve on @Fred Foo answer, by providing a modified version of his script, which differs in that it does not store the files and directories in the repository as a side effect of computing their hashes: http://pastebin.com/BSNGqsqC
Unfortunately I am not aware of any way to force git mktree
to not create a tree object in the repository, so the code has to generate a binary representation of the tree and pass it to git hash-object -t tree
.
This script is based also on answers from What is the internal format of a git tree object?
The general idea is to use git hash-object -- data.txt
to get hash of a file, and to use git hash-object --stdin -t tree < TreeDescription
for a directory, where:
- TreeDescription is a concatenation of
"mode name\0hash"
mode
is "100644"
for files, and "40000"
for directories (note the lack of leading zero in case of directory)
mode
and name
are separated by a single space,
name
and hash
are separated by a single byte \0
hash
is a 20-bytes long binary representation of object hash
- entries are sorted by
name
, which seems not entirely necessary to create a tree object, but helps to determine if two directories are equivalent by comparing their hashes - unfortunately I am not aware which sorting algorithm should be used here (in particular: what to do in case of non-ascii characters)
Also note that this binary format differs a little bit from the way a tree object is stored in the repository in that it lacks the "tree SIZE\0"
header.
Obviously you have to compute this bottom-up, starting from deepest files, as you need hashes of all children before computing the hash of a parent.