Many older source control systems, such as RCS and CVS, specifically store differences between versions of files. For example, the information for a given source file might be stored in the repository in a form that includes the full text of the latest version, plus "instructions" for generating earlier version.
Git, at least conceptually, stores the entire content of each version of every file in the repository. It saves some space by storing only one copy of identical files, since the name used to store it is determined by hashing the contents.
Obviously if that were the whole story, Git repositories would become very large very quickly. But Git automatically packs, or compresses, stored objects. I frankly don't know all the details, but it does a good job of both minimizing storage space and permitting arbitrary versions to be recreated quickly.
For example, the Git sources are themselves stored in a Git repository, which contains probably thousands of distinct objects. All the versions of all the files are stored under the directory .git/objects/pack
, which currently contains the following (the listing is of a clone on my system):
$ ls -l .git/objects/pack
total 48900
-r--r--r-- 1 kst kst 4196172 Mar 20 15:44 pack-0e69de7b7728ad0fde80423ded259dbff7760016.idx
-r--r--r-- 1 kst kst 36698393 Mar 20 15:44 pack-0e69de7b7728ad0fde80423ded259dbff7760016.pack
-r--r--r-- 1 kst kst 125896 Jun 30 22:17 pack-2848a675d3c196391f06cc7cdd6cebf67fb7119e.idx
-r--r--r-- 1 kst kst 3570770 Jun 30 22:17 pack-2848a675d3c196391f06cc7cdd6cebf67fb7119e.pack
-r--r--r-- 1 kst kst 178452 May 16 08:22 pack-bfd75de39dff6ac03adcc775f7b5715480b54637.idx
-r--r--r-- 1 kst kst 5292998 May 16 08:22 pack-bfd75de39dff6ac03adcc775f7b5715480b54637.pack
What's different about Git compared to earlier systems (at least to the earlier systems I've used) is that, on a high level, all versions of all files in the repository are stored in full, but the compression is provided by a separate layer.