So it always is larger than the file in the working directory.
Nope.
In order to understand you need to know how git stores its data.
Git uses heuristics to find similar parts of your code. In other words, when git finds identical content (whole file or part of it) it doesn't store it twice but instead, it stores it once and uses pointer o point to the first occurrence. This is known as hunks.
Whenever you execute git add
, git grabs the content, "sets" up the hunks and stores them later on inside the pack file. So back to track, when you execute git add
git grabs the content, hashes it using sha1sum, hash-object and more, zips it and stores it inside your .git/objects folder.
The "real" content of your files (once git packs it later on) are simply smaller chunks known as hunks
and git knows how to index them into your original file.
What are hunks?
Hunks are patch files. You can see them when you execute git add -p
and then, if you have multiple changes on several locations in your files, choose the s
and you will see them.
These are the options you can do within add -p
:
y - stage this hunk
n - do not stage this hunk
q - quit, do not stage this hunk nor any of the remaining ones
a - stage this and all the remaining hunks in the file
d - do not stage this hunk nor any of the remaining hunks in the file
g - select a hunk to go to
/ - search for a hunk matching the given regex
j - leave this hunk undecided, see next undecided hunk
J - leave this hunk undecided, see next hunk
k - leave this hunk undecided, see previous undecided hunk
K - leave this hunk undecided, see the previous hunk
s - split the current hunk into smaller hunks
e - manually edit the current hunk
? - print help
Once you use the s
it will pick the chunk of code which can be considered as a standalone change. If you want to split it even more, you will have to use the e
to edit the hunk and then add it back to the stage area.
Git stores "patches" which are the delta of your changes, but git adds a few other "layers" on top of it. It reuses the same content once it "sees" it, (it's being done using the heuristics) and adding only "new" changes while pointing to the old ones.
Later on git grabs the content and packs it using ZIP.
