I always wonder how Git stores directories, does Git following Linux's philosophy "anything is a FILE", then see directory as file to store?
-
1A directory is mapped to a tree object. https://git-scm.com/book/en/v2/Git-Internals-Git-Objects – ElpieKay Sep 04 '20 at 03:30
2 Answers
While AtnNn's answer is correct in terms of how the internal storage works, it's worth noting that Git builds these tree objects from the thing that Git calls its index or staging area or (rarely now) cache. The index is not capable of holding directories: it holds only files. The files in the index simply have long path names with embedded slashes, such as path/to/file.txt
.
The git write-tree
command reads through the index and splits this up:
- It creates a tree object that will contain an entry for a blob object, held under the component-name
file.txt
. This tree object will acquire a hash ID once it is created. Let's call this hash ID H2. - It creates another tree object that will contain an entry named
to
. The entry forto
will store hash ID H2. (It may contain more entries: it will contain one for each other path that begins withpath/to/
.) Whengit write-tree
writes out this tree object, it will obtain a hash ID; let's call this hash ID H1. - It then creates another tree object that will contain an entry named
path
, which will store hash ID H1. (As before, it may contain more entries, such as one namedREADME.md
that will hold the hash ID of the blob containing theREADME.md
file's content.) Whengit write-tree
writes out this tree object, it will obtain a hash ID, which we can call H0.
The git write-tree
command reports this hash ID H0 to its standard output.
The git commit-tree
command uses this hash ID, plus additional information, to create a commit object. The commit object will have H0 as its tree
. Hence the commit will refer to tree H0.
To read the commit into Git's index, git read-tree
notes that there is a sub-tree named path
inside H0, so it reads that sub-tree (hash H1) and finds that there's an entry named to
giving H2. It therefore reads that sub-sub-tree and finds the entry named file.txt
giving the blob hash ID for the file. It then writes path/to/file.txt
into the index, storing the hash ID for the blob object.
While git commit
and git checkout
now have all of these steps built into them, you can still use git write-tree
followed by git commit-tree
to make a new commit. You can still use git read-tree
to read a tree into Git's index, and then use git checkout-index
to extract the files into a work-area. The index has no directory names in it! It has only file names. The checkout code will just create new directories when needed: that is, if Git needs to create a file named path/to/file.txt
and there is no path
yet, Git will make it. Now that there is a path
, Git will make path/to
as well if needed, and now that path/to/
exists, Git can create a file named file.txt
within path/to/
.
The fact that Git doesn't store directories in the index means that:
- you have no way to store permissions for directories;1 and
- there is no proper way to store an empty directory either.
There is a submodule trick that works for empty directories: see this answer to How can I add an empty directory to a Git repository?
1Since the only allowed file modes today are 100755
(executable) and 100644
(not-executable), there's no place to store group-write permission anyway. In the early days of Git, you could store a file as mode 100664
for instance, so it would have made more sense then. Note that on Linux, directories must be executable to use them, so while tree objects are stored as mode 40000
, the actual on-disk inode has mode 040777 & ~umask
, where 040000
is the S_IFDIR
bit. See, e.g., https://docs.huihoo.com/doxygen/linux/kernel/3.7/include_2uapi_2linux_2stat_8h.html

- 448,244
- 59
- 642
- 775
Git stores directories as tree
objects which contain, for each entry in the directory, the mode, type, hash and name of the entry. For example, in a Git repository with a file and a folder at the root:
$ ls
example.txt
src/
$ git cat-file -p HEAD:
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 example.txt
040000 tree 87a2294c8c0351121cefbaef16cbe88dd2b64b80 src
The cat-file
command shows the pretty (-p
) version of the given object, HEAD:
. The extra colon refers to the root directory of the branch. HEAD:src
would refer to the src
subfolder.
We can examine the raw directory data by passing tree
instead of -p
:
$ git cat-file tree HEAD: | hexdump -C
00000000 31 30 30 36 34 34 20 65 78 61 6d 70 6c 65 2e 74 |100644 example.t|
00000010 78 74 00 e6 9d e2 9b b2 d1 d6 43 4b 8b 29 ae 77 |xt........CK.).w|
00000020 5a d8 c2 e4 8c 53 91 34 30 30 30 30 20 73 72 63 |Z....S.40000 src|
00000030 00 87 a2 29 4c 8c 03 51 12 1c ef ba ef 16 cb e8 |...)L..Q........|
00000040 8d d2 b6 4b 80 |...K.|
If the git repository isn't packed, this tree object will be stored in .git/objects
. We can use rev-parse
to find its hash:
$ git rev-parse HEAD:
cb8fd5fa2bf22ffa242d4e3fa520849551bbfa98
The zipped contents are the same data as above with a small prefix:
$ cat .git/objects/cb/8fd5fa2bf22ffa242d4e3fa520849551bbfa98 | zlib-flate -uncompress | hexdump -C
00000000 74 72 65 65 20 36 39 00 31 30 30 36 34 34 20 65 |tree 69.100644 e|
00000010 78 61 6d 70 6c 65 2e 74 78 74 00 e6 9d e2 9b b2 |xample.txt......|
00000020 d1 d6 43 4b 8b 29 ae 77 5a d8 c2 e4 8c 53 91 34 |..CK.).wZ....S.4|
00000030 30 30 30 30 20 73 72 63 00 87 a2 29 4c 8c 03 51 |0000 src...)L..Q|
00000040 12 1c ef ba ef 16 cb e8 8d d2 b6 4b 80 |...........K.|
And we can confirm that the hash is correct:
$ cat .git/objects/cb/8fd5fa2bf22ffa242d4e3fa520849551bbfa98 | zlib-flate -uncompress | sha1sum
cb8fd5fa2bf22ffa242d4e3fa520849551bbfa98 -
See the "Tree Objects" section of the documentation for more information.

- 6,731
- 2
- 27
- 31