1

Git stores each version of a file in a new object. This data model is not very storage efficient when storing multiple versions of a file. How does Git achieve storage efficiency anyway?

Holiko Wu
  • 11
  • 1

1 Answers1

3

Git makes storage efficient by two means:

  1. Storing everything under its hash has the effect of perfect deduplication: You have the same file/tree over and over again? Well, no, there's just a single hash to it, so only a single copy is kept.

  2. Storing old versions as diffs to the newer versions. So, an old versions file blob is the diff to a newer version, which it references. The blob is still stored under the hash of the old version, guaranteeing that the data that's reconstructed by resolving the diff is still the same as the data that once was stored.

All of this is entirely transparent to the higher layers of git, the only visible effect is increased efficiency.

cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106