Except for the user-facing documentation's insistence on using the word tree-ish (if that even is a word), the term tree is internal to Git, so it shouldn't matter what they call it: tree, or marplot, or gripsack, or whatever you like.
That said, a tree object, inside Git, is simply one of the four object types. What it contains is a series of entries, with each entry holding three items:
- a mode: an octal number, terminated with ASCII space, with no leading zeros, that describes the type of the entry and gives the
x
bit for regular files;
- a name: a byte-sequence terminated with an ASCII NUL (
'\0'
in C, b'\0'
in Python); and
- a raw hash ID: 20 unencoded bytes.1
The name in a tree object is really just a name component. If the mode entry is 40000
, the hash ID must be that of another tree object. If the mode is 120000
, 100644
, or 100755
, the hash ID must be that of a blob object. If the mode is 160000
, the hash ID is expected to be a commit object as stored in some other Git repository, i.e., a gitlink. Other modes are generally not allowed, though git fsck
allows 100664
as this mode appears in some existing (very old) repositories.
The file name of a blob or (mode 120000
) symbolic link is constructed by stringing together the name components of the tree objects that led to the blob, with slashes appended, and then adding the last component in the final tree object. That is, if the top-level tree object for some commit is T0, and the blob or symlink appears directly in T0, then the entry gives the name of the file that will hold the blob or symlink.
But if T0 has an entry foo
with mode 40000
and hash T1, Git will go on to read tree object T1. If that has an entry bar
with mode 100xxx
or 120000
, the blob object will be a file or symlink whose name is foo/bar
. Hence the file's path name is produced by traversing tree objects until reaching a leaf.
For a gitlink (tree entity with mode 160000
), the constructed path name gives the submodule path that Git will check for in .gitmodules
, if we must clone the submodule, and the hash ID is the commit we'll git checkout
as a detached HEAD in that other Git repository. For all other entities, the hash ID should be that of an object in this Git repository, otherwise the tree object is incorrect or the repository is inconsistent (or both).
As someone using Git, you do not have to care about any of this: just put files in the index as usual, and use git write-tree
to write everything. Use git read-tree
to grab a tree by the hash ID in a commit, to fill the index2 from that tree. Use git show
or git cat-file
to obtain a single file's contents using either a hash ID (blob hash) or a path name (commit-hash:path
, which git rev-parse
can translate, and for a long time now, git cat-file
can handle as well).
1This is kind of a mistake, because when Git goes to using longer hash IDs in the future, either the tree objects may have to store truncated hashes, or we'll need a new flavor of tree object. Note that Mercurial's internal tree data structures left more room. Git probably should have used an ASCII-ized hex digest terminated by another NUL. But there are enough other thorny issues here to be resolve that this one is kind of minor.
2If you set GIT_INDEX_FILE
, git read-tree
will read the tree into the alternate index whose path name you provided.