1

Recently I read about git internals and found that under the hood git hashes its objects:

$ echo 'test content' | git hash-object -w --stdin

d670460b4b4aece5915caf5c68d12f560a9fe3e4

How does it "unhash" its hash objects and the content of it?

$ git cat-file -p d670460b4b4aece5915caf5c68d12f560a9fe3e4

test content
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
mykyta
  • 13
  • 2
  • Why don't you run `strace git cat-file -p `. – Kaz Aug 17 '22 at 16:52
  • Note that two different contents can *theoretically* produce the same hash, and if and when they do, Git would break (sort of). See [How does the newly found SHA-1 collision affect Git?](https://stackoverflow.com/q/42433126/1256452) – torek Aug 18 '22 at 10:41

3 Answers3

5

Git does not unhash its objects. It uses the hash as a lookup key, just like a hash table.

git cat-file -p d670460b4b4aece5915caf5c68d12f560a9fe3e4

Git uses d670460b4b4aece5915caf5c68d12f560a9fe3e4 to look up the content. It can be in two places, .git/objects/ (aka "loose objects") or a packfile.

In the case above, Git would look for .git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4. If that file exists, then it decompresses it and voilà, there's your content.

You can see this yourself by decompressing the file with openssl zlib -d < .git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4.


Periodically, Git will clean up the loose objects into "packfiles". These are binary files which hold the information for a lot of objects. This is more efficient than individual files. These get a bit complicated in the details, but again the SHA1 hash is used to look up the content in the file. Your example might look something like this.

d670460b4b4aece5915caf5c68d12f560a9fe3e4 blob   61 60 285843732

Git uses this information to get the content, and it's complicated exactly how that works. You can read about the gory details if you like.

Schwern
  • 153,029
  • 25
  • 195
  • 336
3

How does it "unhash" its hash objects and the content of it?

It finds the object in the index. If you look in the .git/objects directory, I suspect you'll find a directory called "d6" with a file called 70460b4b4aece5915caf5c68d12f560a9fe3e4. My understanding is that that file contains the content of the object - although probably compressed in some way.

But there's no magic going on of conjuring information out of nothing. (And in particular, if you use that same git cat-file command in a repo which doesn't have that object, it will fail with something like:

fatal: Not a valid object name d670460b4b4aece5915caf5c68d12f560a9fe3e4
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
0

Git translates some things into sha1 values (and that tracking can be static (e.g. for commits) or dynamic (e.g. for branches)), but those sha1 values references are stored as complete values, there is no "unhashing" involved.

Example:

$ mkdir /tmp/test
$ cd /tmp/test
$ git init
$ touch .gitignore
$ git add .gitignore
$ git commit -m .gitignore

# One commit is now created, so how many objects have git created?
$ find .git/objects -type f
.git/objects/82/e3a754b6a0fcb238b03c0e47d05219fbf9cf89
.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
.git/objects/8e/816c8b0c098993d0b018cb4d16ce45a43c7ab0

# One commit
$ cat .git/refs/heads/main 
8e816c8b0c098993d0b018cb4d16ce45a43c7ab0
$

# which references one tree object
$ git ls-tree main
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    .gitignore
$

# which references one (empty) file
$ git cat-file blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$

So the commit object stores a hard coded, full tree reference, and it can never be changed - the reference value is an inherent basis of calculating the commit id, so if you rebase or amend a commit you end up with a different commit id.

A branch on the other hand is constantly updated by git as commits are added/changed/removed:

$ echo '*.bak' >> .gitignore
$ git add .gitignore 
$ git commit -m "Ignore backup files"
$ cat .git/refs/heads/main 
351ac7498b2eeb73d91a01e5e3270b2bb8ae47a3
$ git log --oneline
351ac74 (HEAD -> main) Ignore backup files
8e816c8 .gitignore
$

however again the sha1 reference stored is here also completely full, there is no need to calculate it in any way.

hlovdal
  • 26,565
  • 10
  • 94
  • 165