I suppose the git directory is the .git folder
For a typical clone, yes. There are exceptions (like bare repos) but that isn't too important to your question; when you look at the local repo you get by cloning something, you can expect .git
to be what progit
calls "the git directory".
So, where are the different snapshots stored there? It seems the folder is way too small to have them there.
They .git/objects
directory contains the repo's content. Files are represented as BLOB
objects; directories as TREE
objects; and there are also COMMIT
objects and various other types of object used for various git features. It's not easy to inspect these files by hand, but the data is there and you can use lower-level git commands to navigate it if you want (e.g. git cat-file
).
An object can be in "loose" storage, in which case it's somewhere in the various directories whose names are two hex digits. Or - as would be expected in a fresh clone - they can be in "packed" storage (under .git/objects/pack
). A couple forms of compression - including deltas for older versions of files - are used to control the size of this data on disk as the repo history grows. That is why the directory may not seem like it takes "enough" space to hold everything.
(As an aside, certain types of file do not "play nice" with the compression methods git uses; this is one raeson why large binary files should be managed with a tool like LFS.)
it says that when you clone a git repository it copies the .git folder, but doesn't it also copy the file contents in the working tree? or does it take it out of the .git folder?
clone
only copies the git directory. Unless given options to the contrary, it then does a checkout
of the default branch (usually master
), which creates a copy of the working tree. It depends on how your remote is hosted, but the odds are you wouldn't actually find the working tree on the remote, so clone
couldn't copy it directly from the remote even if it wanted to. It has to extract it from the database.
One corollary of all this is, only committed data can be shared by clone
, fetch
, or push
. That is to say, suppose you create a local repo
mkdir repo1
cd repo1
git init
touch file1
git add .
git commit -m1
echo hi > file1
touch file2
Now there are reasons why you typically don't use a repo that has worktrees as a remote... but you could.
cd ..
git clone repo1 repo2
Now if you look at repo2
, you'll see that it only has an empty file1
; nothing that wasn't committed in repo1
is visible - unlike if the working directory had been copied by clone
.
cd repo1