0

I am been curious on how GIT stores its version informations for a file.

I am guessing that they are likely to be deltas, but if there are a lot of versions of a file (e.g. 100) then:

a) When (if it does) store a full version of a file (e.g. testing.txt).

b) When a new version is created (let's say 'Hello World' was added to it) then does it just store a delta.

c) If you have a take the 100 versions of file testing.txt and I compare 20 to 90 how does it build the 2 versions to then diff?

Thank you.

Dai
  • 141,631
  • 28
  • 261
  • 374
Swatcat
  • 73
  • 6
  • 21
  • 57
  • https://git-scm.com/book/en/v1/Git-Internals perhaps? – zerkms Oct 31 '18 at 02:16
  • 1
    Git internals is a can of worms and is fully explained in the online Git book here: https://git-scm.com/book/en/v1/Git-Internals - have fun! – Dai Oct 31 '18 at 02:16
  • Possible duplicate of [Git internals: how does Git store small differences between revisions?](https://stackoverflow.com/questions/43359590/git-internals-how-does-git-store-small-differences-between-revisions) – Dai Oct 31 '18 at 02:22

1 Answers1

0

(Disclaimer: I am not a git expert - other users on SO are far more knowledgeable than me and I invite them to edit and improve my answer)

Git's outward user-facing model is that a commit ostensibly represents a snapshot of the state of your repo, not a delta or changeset the way that SVN and TFS work. This is what makes Git so powerful: because it's easier to reason about snapshots (and make arbitrary differences between snapshots) than it is to reason about a sequence of deltas. For example, try doing a rebase in SVN. This is also why Git doesn't store file renames specifically.

Internally, Git uses different approaches and it may use deltas or it may just store a simple straight copy of a file (e.g. Git LFS). The point is that its internal representation of your repo is an implementation detail that's abstracted away and you shouldn't concern yourself with it unless you really need to know (but it's good to be curious!)

In response to your questions:

  1. When (if it does) store a full version of a file (e.g. testing.txt).

Whenever it suits it, such as when it's faster to store a full file than compute diffs (e.g. after you git add and git commit some new files) or when you make substantial changes to lots of small files.

Git is optimized for speed(citation needed), not space, so if it's faster not to store a diff then it won't store a diff.

  1. When a new version is created (let's say 'Hello World' was added to it) then does it just store a delta.

(By "new version" I assume you mean "new commit".)

Not automatically and not necessarily. I recommend reading this QA thread: Git internals: how does Git store small differences between revisions?

  1. If you have a take the 100 versions of file testing.txt and I compare 20 to 90 how does it build the 2 versions to then diff?

Conceptually, it takes snapshot 20 and snapshot 90 and instantly compares the two.

However internally it may need to build snapshot 20 and snapshot 90 from its object store before it can compare them - and there may be built-in optimizations that enable it to detect and ignore irrelevant commits and deltas.

Dai
  • 141,631
  • 28
  • 261
  • 374