GIT, How is the working directory populated

Question

I am wondering how the GIT working directory "working tree" is populated?

Are the files somehow extrapolated through the tree like relationship that exists starting at the commit that HEAD refers to and working "backwards" towards the root of the tree?

Maybe if someone could provide some type of high level process that occurs.. ie,

1.) Add all the files contained in the commit referred to by HEAD to the working tree.

2.) Recursively, for each file referenced by the parent commit of HEAD, ad those to the working tree as well.

I'm curious how this work, is there a verbose mode to something like git checkout where a hypothetical function called build_working_tree() would output its actions?

There's no need to follow commits backwards to populate the working tree. See https://git-scm.com/book/en/v2/Git-Internals-Git-Objects — ChrisGPT was on strike, Oct 01 '17 at 02:35
At the lowest level, the work tree is updated as a side effect of updating the index: `git checkout` does the equivalent of `git read-tree -u`, but in a more user friendly fashion. For details, see [my answer to a related question](https://stackoverflow.com/a/45800673/1256452). — torek, Oct 01 '17 at 15:56

score 0 · Accepted Answer · answered Oct 01 '17 at 04:51

Ignore for the moment exactly how a repo creates file trees and points to individual files. Let's just stick to the fact that they do create references to these objects. The important point is that if the object is exactly the same (including having the exact same folders and exact same files in it for a folder), then git will just point to the same object in a future commit.

Assume commit c1 has an initial commit with just file1.txt

c1 -> file1

then commit c2 is made, which has the same file1, so it just creates a reference to the old object for that (same object as c1 did). It also adds a folder dir1 and a file2 inside of dir1, so it creates links to those.

c2 ----> dir1 -> file2
     \         
c1 -> file1

Now add a commit c3, and again, have file1 be the same, so c3 can still point to the same object, and file2 is the same but a new file3 is added to dir1. This means dir1 has to change (I show this as dir1*, but it can still point to the old file2 object. A new file3 is added to dir1* as well.

c3 -> dir1* ------> new file3
   \            \
c2 -\ -> dir1 -> file2
     \         
c1 -> file1

The point is, you don't need to know anything about c1, c2, or even dir1 in order to recreate the working directory for c3. It is pointing to file1, dir1*, file2, and file3, and can find them in the object repo without needing to know about the other objects.

Now, there is more to it, of course, because sometimes Git only stores the differences between the files, if the files are big and the diff is small (among other optimizations), but this high-level conception covers the basic idea.

As far as the lower-level plumbing commands, yes they do exist, and Git actually uses them when it does it's thing. These are outlined in the link that Chris gave in his comment: Git Internals: Git Objects. This will show you how to follow the commit hash into the objects stored in the repo and display the text in each one - both the hash pointing to each object, and the actual object itself.

LightCC, you said "The point is, you don't need to know anything about c1, c2, or even dir1 in order to recreate the working directory for c3. It is pointing to file1, dir1*, file2, and file3," ---- Are you saying c3 is pointing to file1, dir1, etc.. directly OR indirectly via c2? — , Oct 01 '17 at 19:23
Directly. The directory tree hash that is stored in `c3` will contain just `file1` and `dir1*`, then the hash within the `dir1*` will have `file2` and `file3` pointers in it. The commit itself also points at the parent commit (if there is one), but that's just for log history information, it doesn't use it to find any of the files/directories. — LightCC, Oct 02 '17 at 03:47

GIT, How is the working directory populated

1 Answers1