7

So I have started using git for a while now and understanding how it works gradually. One main point I understood is that - It creates a snapshot every time a new commit is made. Of course snapshot will contain only changed files and pointers to unchanged file.

According to Pro Git § 1.3 Getting Started - Git Basics

Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again—just a link to the previous identical file it has already stored.

But let's say I have really big file e.g. 2GB text file. And I change that file 10 times and hence make 10 commits in a day, does that mean - I now have 10 2GB files on my computer? That seems really inefficient to me So I am believing this might not be the case.

Could someone clarify what would happen in this scenario?

RandomQuestion
  • 6,778
  • 17
  • 61
  • 97
  • 1
    Git tracks changes, not files – Tim May 02 '14 at 06:36
  • It def. does not store 10 copies of the file. – Ryan May 02 '14 at 06:36
  • @TimCastelijns, According to http://git-scm.com/book/en/Getting-Started-Git-Basics `Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again—just a link to the previous identical file it has already stored.` So It doesn't mean what I think it means? – RandomQuestion May 02 '14 at 06:38
  • But @TimCastelijns, the question is "how does git track a small change in a big file?" – Andreas Wederbrand May 02 '14 at 06:38
  • 4
    Possible duplicate of [How does git store files?](http://stackoverflow.com/questions/8198105/how-does-git-store-files) –  May 02 '14 at 06:42
  • @AndreasWederbrand no it's not. Anyway I wasn't answering the question, just making a comment – Tim May 02 '14 at 06:42
  • 1
    See [this answer](http://stackoverflow.com/a/8198276/456814), particularly the last part. –  May 02 '14 at 06:55
  • A correction to @TimCastelijns comment: git tracks *content*, but uses deltas (if it sees fit) for internal storage. Go read the "duplicate" link : [How does git store files ?](http://stackoverflow.com/questions/8198105/how-does-git-store-files) – LeGEC May 02 '14 at 08:47
  • I've used git to track daily changes on a database : daily dump of each table in its own `table.sql` file. (Warning : this is not an intended use of git, and will work poorly if you have a very active db.) I regularly run the `git gc` command (I think this implies a `repack`), and the repo size is roughly the size of the compressed dump (it's clearly not [nbDays] times the compressed size). – LeGEC May 02 '14 at 08:56
  • @RPM It does but compresses them when the objects are packed and saves space. – Noufal Ibrahim May 02 '14 at 09:28

2 Answers2

9

The short answer is "yes, you now have 10 2GB files". However:

  1. "Files" under a commit are stored as "blob" objects, and all git objects (blobs, trees, commits, and annotated-tags) are kept internally in zlib deflated format. So a 2 GB text file is actually a considerably smaller object.

  2. "Loose" objects (all of them, again) are eventually "packed". You can do this manually with git pack-objects and git repack but generally you just let git do it on its own as part of standard "garbage collection" (git gc). Inside a pack, objects are delta-compressed against similar objects. The end result with most files is pretty impressive.

All that said, git eventually fails badly if you feed it a lot of large incompressible binary files (I had to deal with this at a previous workplace, where we stuffed 2GB of .tgz files into repos). They don't deflate, they generally don't delta-compress, and eventually even the pack format falls over. There are at least two solutions in relatively widespread use: git-annex and git-bup. See Managing large binary files with git.

Community
  • 1
  • 1
torek
  • 448,244
  • 59
  • 642
  • 775
3

I just tested it.

First I created a large file (24 MB of text) and committed it. My .git directory is now 216 KB large. git uses compression and my text file was easy to compress.

I then made a small change on the first line in the file and committed that. My .git directory is now 356 KB large. .git/objects now contains two objects, both 132 KB large.

132K    ./.git/objects/8d
132K    ./.git/objects/f7

After running git gc those two objects are compressed into a pack-file only 68 KB.

So at least under some circumstances git will keep entire copies of large files for a while.

Andreas Wederbrand
  • 38,065
  • 11
  • 68
  • 78