How to compute the git hash-object of a directory?

Question

Does anyone have an example of using git hash-object on a directory? It works easily enough on a file* but doesn't work as I'd expect for a directory**

*:  git hash-object c:\somefile.txt
**: git hash-object -t tree c:\somedirectory

When I try to use hash-object with the directory, it complains "fatal: Cannot open 'C:\someDirectory': Permission denied"

Andy Pryke · Answer 1 · 2017-12-08T21:26:54.400

30

Depending why you wish to do this, the following git command might be useful:

git ls-files -s somedirectory | git hash-object --stdin

This give a single hash which takes into account the filenames and contents.

It works like this. The git ls-files -s .... outputs a list of files and their hashes as text to stdout, then git hash-object generates a hash for the data it receives from stdin.

My use case for this is the following - I want to know whether the (git managed) files in a directory in one branch exactly(*) match those in another branch. The specific use is to compare the "directory hashes" decide whether I need to re-generate derived files which are cached.

By default git ls-files will list files in sub-directories too. If you don't want that, try looking at answers to "how to git ls-file for just one directory level. There are also various other options to git ls-files, including the ability to specify a list of files to include.

(*) excluding hash-collisions

edited Dec 08 '17 at 21:26

answered Dec 08 '17 at 20:20

Andy Pryke

437
5
5

11

I think `git ls-tree HEAD somedirectory` is sufficient, git already hashed the dir. No need to `ls-files` the entire directory and rehashing it with `git hash-object`. – akhy Jan 19 '18 at 11:13
2

I guess my longer solution above might be handy if you needed to restrict the files included in the hash calculation e.g. to particular file extensions, or to exclude sub-directories. – Andy Pryke Jan 24 '18 at 15:26
I wonder, can this be made to work outside of a git repo? See also https://stackoverflow.com/questions/69730660/compute-git-hash-of-file-or-directory-outside-of-git-repository – donquixote Oct 26 '21 at 22:28
Note that this may result in a subtle bug: due to `git ls-files -s` including a path to the file from the current dir, you will get different result for the same file(s) whenever you call the command from a different directory. In my opinion, a much better way is calling `git hash-object $(git ls-files somedirectory)` *(note no `-s` option)*. Though this does not include a file mode, but I think you're more likely to stumble upon a bug due to different hashes of the same file, than due to mode being changed and not included in a hash. You can ofc further improve the command to include the mode – Hi-Angel Oct 18 '22 at 13:59

score 19 · Accepted Answer · edited Mar 03 '23 at 23:57

19

git hash-object -t tree is expecting the file parameter to be a file that describes the entries in the tree, rather than a directory in the filesystem. I understand from the comment here that this command is expecting a file that describes the tree in a binary format, and that it would be easier to use git mktree for you to create the tree object.

git mktree understands input of the format you get from (for example) git ls-tree HEAD. There is a nice example of constructing a tree from scratch using git hash-object and git mktree in the Git Community Book.

edited Mar 03 '23 at 23:57

larsks

277,717
41
399
399

answered May 15 '11 at 21:19

Mark Longair

446,582
72
411
327

1

The git book does not contain an `mktree` example anymore, and the source on GitHub doesn't go back far enough to find it in older versions. – Suzanne Soy Mar 25 '21 at 16:37
The link text references the "Git Community Book", but when the question was edited backed in 2016 it was replaced with a link to the "Pro Git" book, which is a different text. The "Git Community Book" is hosted elsewhere, and the originally referenced documentation can be found [here](https://shafiul.github.io/gitbook/7_raw_git.html). – larsks Mar 03 '23 at 23:56

ErikE · Answer 3 · 2021-10-15T17:27:21.847

13

I'm not sure about getting the hash for a directory (and all of its contents) outside of a git repository, but for a directory inside of a repository, try this to print only the hash:

git rev-parse HEAD:some/directory

There is no need to use other commands that require additional processing.

This will also work but provides additional information you may not want (such as the file mode and other data):

git ls-tree HEAD some/directory

edited Oct 15 '21 at 17:27

answered Oct 29 '19 at 00:41

ErikE

48,881
23
151
196

yes! `git rev-parse` prints only the tree sha, `git ls-tree` prints only the tree body – milahu Oct 15 '21 at 13:38
@MilaNautikus Thanks for pointing out that distinction! – ErikE Oct 15 '21 at 17:26

score 4 · Answer 4 · answered Jun 24 '14 at 13:15

I had the same problem and hacked up a Python script to hash a complete directory. It's limited in the sense that it doesn't take the .gitignore file into account, but it's serves its purpose so far (hash directory, make commit object, store it on the gh-pages branch).

score 2 · Answer 5 · edited May 23 '17 at 10:30

I'd like to improve on @Fred Foo answer, by providing a modified version of his script, which differs in that it does not store the files and directories in the repository as a side effect of computing their hashes: http://pastebin.com/BSNGqsqC

Unfortunately I am not aware of any way to force git mktree to not create a tree object in the repository, so the code has to generate a binary representation of the tree and pass it to git hash-object -t tree.

This script is based also on answers from What is the internal format of a git tree object?

The general idea is to use git hash-object -- data.txt to get hash of a file, and to use git hash-object --stdin -t tree < TreeDescription for a directory, where:

TreeDescription is a concatenation of "mode name\0hash"
mode is "100644" for files, and "40000" for directories (note the lack of leading zero in case of directory)
mode and name are separated by a single space,
name and hash are separated by a single byte \0
hash is a 20-bytes long binary representation of object hash
entries are sorted by name, which seems not entirely necessary to create a tree object, but helps to determine if two directories are equivalent by comparing their hashes - unfortunately I am not aware which sorting algorithm should be used here (in particular: what to do in case of non-ascii characters)

Also note that this binary format differs a little bit from the way a tree object is stored in the repository in that it lacks the "tree SIZE\0" header.

Obviously you have to compute this bottom-up, starting from deepest files, as you need hashes of all children before computing the hash of a parent.

score 0 · Answer 6 · answered Mar 18 '14 at 19:33

as Mark Longair said, mktree is the way to go.

I had the same problem and had to struggle a lot to fix it. This is what I did:

git ls-files -s directory_path

This will give you a list of the contents of the directory with its hashes.

You can then turn this list into ls-tree format in a text editor and

echo -e "{ls-tree format list}" | git mkdir

score 0 · Answer 7 · answered Apr 30 '15 at 03:30

After long searching I found the following command:

git write-tree

Source: http://git-scm.com/docs/git-write-tree

I used it to recover the missing directory:

git write-tree path/to/missing/folder

And my missing tree object got created. From here you can continue using:

git hash-object -w path/to/missing/folder/file.txt

As explained in: https://git.wiki.kernel.org/index.php/GitFaq#How_to_fix_a_broken_repository.3F

How to compute the git hash-object of a directory?

7 Answers7

Linked

Related