10

If I read correctly git stores all it's files in blobs. If you modify a file does the modified version of the file get it's own blob and therefor it's own sha?

nulltoken
  • 64,429
  • 20
  • 138
  • 130
Pickels
  • 33,902
  • 26
  • 118
  • 178

2 Answers2

6

That's correct - if the file's content changes even by a single bit, it will have a new object name (a.k.a. the SHA1sum or hash). You can see the object name that the file would have with git hash-object, if you want to test that:

 $ git hash-object text.txt
 9dbcaae0abd0d45c30bbb1a77410fb31aedda806

You can find out more about how the hashes for blobs are calculated here:

Community
  • 1
  • 1
Mark Longair
  • 446,582
  • 72
  • 411
  • 327
  • 3
    It doesn't make sense to say that blob's hash is “commit ID”, it's an ID of the blob. – svick May 08 '11 at 19:04
  • otherwise known as `object name` ([`man git-rev-parse`](http://www.kernel.org/pub/software/scm/git/docs/git-rev-parse.html)). Find all of those with `git rev-list --objects --all` – sehe May 08 '11 at 20:42
5

I would like to add to Mark's answer.

While Subversion, CVS, and even Mercurial use Delta Storage - whereby they only store the difference between commits, Git takes a snapshot of the tree with each commit.

When a file content changes, a new blob is added for the content to the object store. Git only cares about the content at this point and not the filename. The filename and path are tracked through tree objects. When a file changes and is added to the index, the blobs for the content are created. When you commit ( or use low-level commands like git write-tree) the tree object is updated to make the file point to the new content. It is also to be noted that while every change to a file creates a new blob for it, but files with same content will never get different blobs.

So, your question

If you modify a file does the modified version of the file get it's own blob and therefor it's own sha?

The new content gets a new blob and the file is pointed to the new blob. And also, if the new content is same as some previous blob, it is just pointed to the old one.

PS: It is to be noted that Git "packs" these "loose objects" into pack files ( where git stores deltas from one version of the file to the other) when there are too many loose objects around, if git gc is run manually, or when pushing to a remote server, so it can be the case that files are stored in delta. Look at the Pro-Git chapter on this for more info - http://progit.org/book/ch9-4.html

manojlds
  • 290,304
  • 63
  • 469
  • 417
  • Somewhat reduced? Git overhead (which includes the whole repository history) is usually smaller than SVN overhead (which includes only the current version). Also, pack files use delta compression internally, but it doesn't have to be against the previous version as with SVN. – svick May 08 '11 at 19:10
  • Have reworded it and added link to the progit chapter explaining it. One thing is that in a project having large amount of binary files which change frequently, Git tends to have a larger footprint than SVN. That is what I was supposed to mean with somewhat ( should have been "most of the times", but I was writing it as a post script ) – manojlds May 08 '11 at 19:53