How revision control system restores revision?

Question

My question is more general than declared in title.

I know that source version control stores information only about differences. As I understand so does Wikipedia, and so does github.

But they all have ability to show whole file with particular revision. Do they incrementally restore it from first revision to specific one?

And another one question. If they store differences only, how they show them in ui with context (little bit text before and after changes).

EDIT: github stores whole snapshots instead of deltas

score 6 · Answer 1 · edited May 23 '17 at 11:33

I know that source version control stores information only about differences.

As the question Git design decision on storing content rather than differences illustrates, this isn't exactly what Git does.
It does has "packed" format though, to store objects in deltaified form, using binary delta from LibXDiff library. But that mainly is used for network transfer.
See "Is the git binary diff algorithm (delta storage) standardized?".
That is why git is "resolving delta" when you fetch.

score 4 · Answer 2 · answered May 29 '12 at 22:23

For a very interesting read about the pros and cons of different ways of storing version control data I highly recommend reading Eric Sink's article Time and Space Tradeoffs in Version Control Storage.

Storage is one of the most difficult challenges for a version control system. For every file, we must store every version that has ever existed. The logical size of a version control repository never shrinks. It just keeps growing and growing, and every old version needs to remain available.

So, what is the best way to store every version of everything?

score 3 · Answer 3 · edited May 23 '17 at 12:18

Wikipedia, sadly... keeps every single revision in the database in some form of XML(?) as text.

Take a look at the wikipedia database schema. Specifically recent changes and text.

Hence, they have wonderful O(1) lookups to the first copy of the "biology" page. This has the unfortunate side effect of causing wikipedia's technology cost to balloon from $8mil USD in 2010-2011 to $12mil USD in 2011-2012. This is despite HDDs (and everything else) getting cheaper, not more expensive.

So much for revision control of keeping every file. Git takes a cute approach. See Is the git storage model wasteful?.

It stores every file, similar to the above method. Once the space taken by the repo exceeds a certain limit, it does a brute force repack(theres an option to set how hard it tries - --window=[N], --depth=[N] ) which may take hours. It uses a combination of delta and lossless compression for the said repack (recursively delta, then apply lossless on whatever bits you have).

Others like SVN use simple delta compression. (from memory, which you should not trust).

Footnote: delta compression stores incremental changes. lossless compression is pretty much like zip, rar, etc.

How revision control system restores revision?

3 Answers3

Linked