Ignore for the moment exactly how a repo creates file trees and points to individual files. Let's just stick to the fact that they do create references to these objects. The important point is that if the object is exactly the same (including having the exact same folders and exact same files in it for a folder), then git will just point to the same object in a future commit.
Assume commit c1
has an initial commit with just file1.txt
c1 -> file1
then commit c2
is made, which has the same file1
, so it just creates a reference to the old object for that (same object as c1
did). It also adds a folder dir1
and a file2
inside of dir1
, so it creates links to those.
c2 ----> dir1 -> file2
\
c1 -> file1
Now add a commit c3
, and again, have file1
be the same, so c3
can still point to the same object, and file2
is the same but a new file3
is added to dir1
. This means dir1
has to change (I show this as dir1*
, but it can still point to the old file2
object. A new file3
is added to dir1*
as well.
c3 -> dir1* ------> new file3
\ \
c2 -\ -> dir1 -> file2
\
c1 -> file1
The point is, you don't need to know anything about c1
, c2
, or even dir1
in order to recreate the working directory for c3
. It is pointing to file1
, dir1*
, file2
, and file3
, and can find them in the object repo without needing to know about the other objects.
Now, there is more to it, of course, because sometimes Git only stores the differences between the files, if the files are big and the diff is small (among other optimizations), but this high-level conception covers the basic idea.
As far as the lower-level plumbing commands, yes they do exist, and Git actually uses them when it does it's thing. These are outlined in the link that Chris gave in his comment: Git Internals: Git Objects. This will show you how to follow the commit hash into the objects stored in the repo and display the text in each one - both the hash pointing to each object, and the actual object itself.