Git: get a hash of the current state of the working tree?

Question

I would like to ensure that my executable is built with the most up to date version of the code.

For example, I can take the current git commit at the time of compile and bake it into the executable; then when the executable is run, it compares this with the current git commit and if they don't match it complains that the code has been modified and that it is out of date.

However, sometimes I recompile without making a commit, after making small changes to a the code. Then this method doesn't work, as it only accounts for committed changes.

Is there any convenient way to programatically get a hash of the current commit PLUS the state of the working directory, using git or otherwise?

Also, is there a name for this practice?

Why don't you just remember to commit before you compile? You should be able to programmatically check if the working copy is dirty/modified in a script before you start compiling. — , May 29 '14 at 18:53
I'm using this to generate a cache name in a JavaScript service worker. In that case, there is no compile step. I can have the web server dynamically generate a JavaScript file with a cache name based on the hash of the web app. While developing, it is convenient to have the cache name include local changes that have not yet been checked in. — Stephen Ostermiller, Nov 01 '22 at 12:38

score 6 · Answer 1 · edited May 23 '17 at 12:13

6

It is possible to create and store a majority of changes in the current working tree, including all staged, unstaged and untracked files, while respecting .gitignore. Roughly, one needs to

#!/bin/sh
{   git diff-index --name-only HEAD
    git ls-files -o --exclude-standard
} \
| while read path; do
    test -f "$path" && printf "100644 blob %s\t$path\n" $(git hash-object -w "$path");
    test -d "$path" && printf "160000 commit %s\t$path\n" $(cd "$path"; git rev-parse HEAD);
done | sed 's,/,\\,g' | git mktree --missing

The first diff lists all tracked files different from HEAD.

Then we find the untracked ones, but exclude the ignored.

We then pipe output of these two commands into a loop tnat constructs git mktree input for all the files.

The output of that goes through sed because git mktree doesn't recursively construct trees, but the actual paths here don't matter since we just want a hashcode, none of the actual content is ever stored for retrieval.

Finally, we pass this ls-tree-formatted output to mktree, which constructs the specified tree and stores it in Git, outputting the hash to us.

With a bit of extra effort one can also keep information about permissions and possibly even file deletions. After all, this is what Git does when you do an actual commit.

One can argue that all these hoops are useful in situations when you do want to store your changes for future reference but don't want to pollute the index with unnecessary commits for every little change. As such, it may be useful for internal testing with micro-releases, where you can log the local hash as the actual version of your code instead of just the non-descriptive -dirty flag, to see where exactly your code failed when you forgot to tag or commit it for each working version. Some may consider this to be a bad habit that should instead force you to do commit for every successful build, however small - it's hard to argue with that, but then again it's all about convenience.

edited May 23 '17 at 12:13

Community

1
1

answered Jul 03 '15 at 01:15

dan

1,144
12
17

I think you can yank the `-w` option, on the theory that there's no need to put things you're never going to commit into the repo. – jthill Jul 03 '15 at 01:51
This is really clever. Paths in subdirectories aren't valid for `git mktree` but there's no reason to care, because you're only doing it for a hashcode, Subbing in the `\\` makes an acceptable path for `git mktree` and everything comes out right. Kudos for this. It does bypass submodules and symlinks, handling those would take some fairly straightforward if tedious work, but that's probably okay too. I've taken the liberty of subbing in a cleaned-up version with some minor errors corrected rather than adding a separate answer for it – jthill Jul 03 '15 at 05:38
Thanks for the edits. However, I would keep the note about using diff for comparing individual files (not the entire trees, thanks for pointing that out). I.e. we first use `git diff-tree -r HEAD *hash*` to spot differences between this tree and HEAD (or any other commit, for that matter), then find hashes for the *file* of interest that changed (others will show '0000' in the new tree), then we can directly `git diff *hash_old* *hash_new*` to display only changes. A bit involved, but may be worth it. – dan Sep 18 '15 at 23:35
Also, your new version is missing `-w` switch for `hash-object`, so we cannot use diff as described above. It needs to be there to enable this functionality. – dan Sep 18 '15 at 23:37
Gaak. `git hash-object -w`. – jthill Sep 18 '15 at 23:40
1

to avoid pollute the git index db, we can create a temper one: `t=$(mktemp); GIT_DIR=$t git init`, then prefix `GIT_DIR=$t` for `git hash-object` and `git mktree` commands – James Z.M. Gao Jun 29 '21 at 03:49
This only works in the top level of the git repository. `git diff-index --name-only HEAD` produces paths relative to the git top level and `git hash-object` expects paths relative to the current working directory. This can be fixed by adding this as the second line: `cd \`git rev-parse --show-toplevel\`` – Stephen Ostermiller Nov 01 '22 at 12:33
What is the canonical representation of a deleted file to be put into the resultant tree listing? Is it like this: mode is `100000`, hash is the hash of the file path: `100000 blob 17e7dcc30992f78c14878dbb7c31ba58892ec397 some\path\to\deleted\file` – zerox Jul 19 '23 at 08:22

score 3 · Answer 2 · answered May 29 '14 at 21:24

If all you want to do is determine whether there are any uncommitted modifications, that's easy; just run git diff --quiet HEAD and check whether the return code is non-zero.

If you actually need a hash of the changes, so that two users with the same starting commit and the same local modifications will get the same hash, that's trickier. My first thought is to pipe the output of git diff HEAD into sha1sum, and concatenate it to the commit hash, but the output of git diff might vary for different Git versions and config options.

Alternatively, you could use git add -u . && git write-tree to get an honest-to-goodness Git tree object for the current working tree. But that's a destructive operation; it clobbers any partially-staged changes that were already in your index.

You can combine `git diff --quiet && ` so that you don't compile if `git diff` returns non-zero. However, `git diff --quiet` will still return 0 if you simply have untracked files. But really, how hard is it to do `git status` before kicking off a buiild/compilation of your source code? — , May 29 '14 at 22:28

Git: get a hash of the current state of the working tree?

2 Answers2