1

Is there a way to force Git to store versions of a file as complete and separate entities as opposed to downstream commits existing as diffs from the upstream commits?

Some will ask why I want to do this. I want to do this because I was told to do so by my boss. FWIW, The particular file in question is the product of a process where one small change in the inputs can result in significant restructuring of the file.

Huliax
  • 1,489
  • 3
  • 15
  • 27

2 Answers2

7

Git's object storage already does that, and it is not-negotiable.

Git's object database is snapshot oriented, Individual files are blobs and directories are tree objects.

Verify this easily by looking under .git/objects or doing

git rev-list --objects --all

Now, after a while, for efficiency, the object database will be 'compressed' (known as packing). This results in storage efficiency, but does not involve sotring deltas.


Background

Storing deltas was popularized by RCS, CVS, Subversion and others (SourceSafe?). Mainly, because the model made it easy to transfer changesets because they would already be in delta form. Modern VCS-es (mostly distributed) have evolved away from that, and put the emphasis on data integrity.

Data Integrity

Because of the design of the object database, git is very robust and will detect any corrupted bit of data anywhere in a snapshot, or the entire repo. See this post for more details on the cryptographic properties of Git repositories: Linus talk - Git vs. data corruption?

In techno babble: commit histories form cryptographically strong merkle trees. When the sha1 sum of the tip commit (HEAD) matches, it mathematically follows that

  • tree content
  • the branch history (including all sign-offs and committer/author credentials)

are identical. This is a huge security feature of git (and other SCMs that share this design feature)

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Added more info on Data Integrity, one of the absolute strengths of GIT – sehe Feb 06 '12 at 20:39
  • I thought Git did do deltas when compressing ([source](http://progit.org/book/ch9-4.html)). Maybe not in the order that changes occurred, or even on the same file, but it is storing deltas. – Andy Feb 06 '12 at 22:16
  • Well, I guess my problem is rather neatly solved. Thanks for clarifying it for me. – Huliax Feb 07 '12 at 00:59
2

Git objects are stored as full files. (Except when you gc your repo and then they get optimised - but that is an implementation detail). If you know the git sha of the file, you can get it in its entirety by:

git cat-file -p <sha>

which will output the file based on its type.

You can see an article about this on 365git - Git Objects: The Blog

Abizern
  • 146,289
  • 39
  • 203
  • 257