2

I want to store image files in a SVN repository. I have read that SVN will try to store delta-based changes to the repository and not just a simple copy. However, an alternative would be to change the image to base64 and store it a text version. Considering the cost of creating base64 images, will this be more practical or make things worse?

Nevik Rehnel
  • 49,633
  • 6
  • 60
  • 50
Alireza Noori
  • 14,961
  • 30
  • 95
  • 179
  • Git does not use deltas to store differences, and with image files this wouldn't be possible anyway. Are the images expected to change frequently or are they going to stay the same once added? – Ozan Jan 03 '13 at 12:14
  • @Ozan Are you sure? I'm almost positive that Git stores only the changes for text files. Otherwise linux's code wouldn't be 5GB. As for your question, the images could change often. They're from a few websites I want to track. – Alireza Noori Jan 03 '13 at 12:19
  • I deleted my answer as I just learned something from Ozan's comment. [More info](http://stackoverflow.com/questions/8198105/how-git-stores-files) on how git stores files. I still think base64 encoding them is not very useful :) – Sunil D. Jan 03 '13 at 12:23
  • OK. So I was wrong about Git. I should switch to SVN. Thanks for the help. – Alireza Noori Jan 03 '13 at 12:26
  • @SunilD. I'll edit my question for SVN. Please tell me what you think about that. Also, it'll help me select your answer as *answer* – Alireza Noori Jan 03 '13 at 12:27
  • 1
    git repository sizes are reduced by the use of pack files, which indeed includes the use of deltas: http://git-scm.com/book/en/Git-Internals-Packfiles However it won't help with image files. – Ozan Jan 03 '13 at 12:38
  • @Ozan So, I can make Git use deltas instead of storing static copies? Is this the same as using SVN? I need deltas so which one would you recommend? Git or SVN? (Forget images, I want to store HTMLS) – Alireza Noori Jan 03 '13 at 13:01
  • @AlirezaNoori: Git. It applies additional layer of compression and can (sometimes) take advantage of separate similar files. See the answer below. – Jan Hudec Jan 03 '13 at 13:15

2 Answers2

2

Git does not use deltas to store differences, and with image files this wouldn't be possible anyway. This means, that if a tracked image changes, it will add to the size of the repository by 100% of its own size, and since the images are already compressed they are not compressible with git's packing.

The question is, how big the images are and how often they change, by which you can estimate how quickly the repository grows. Then you can refer to repository size recommendations for your use case.

Ozan
  • 4,345
  • 2
  • 23
  • 35
  • OK. I got my answer. Just one thing: If the file doesn't change, SVN won't commit it again (just like source codes), right? In that case, keeping them in repository is better than storing a simple copy. – Alireza Noori Jan 03 '13 at 12:34
  • If the file doesn't change it is not committed again, this is the case with every version control system. I didn't want to deter you from using git, I just wanted to make you aware, that with binary files that change often, the repository size will grow very quickly. This also happens with SVN, it does not use delta compression on binary files. – Ozan Jan 03 '13 at 12:43
  • 1
    Thank you very much. I should reconsider my strategies! – Alireza Noori Jan 03 '13 at 12:57
  • -1: Wrong. Git does use deltas and they do work for binary files. Compressed streams don't gain much from deltas, but the smudge and clean filters can be used to rewrite the images without compression for storage (git will apply deflate on it's own). I am not sure whether it's really worth the trouble though. – Jan Hudec Jan 03 '13 at 13:00
1

Git (and Subversion too) uses deltas for storing files in the repository. They are, both in Git and Subversion, binary deltas that cope with binary files just fine. They also find matching runs of bytes and don't rely on any separators like newlines being present.

While subversion does delta against previous revision of the file, Git initially stores full text and during the gc operation selects some likely candidates and selects most similar file to do the delta against. This means it can (sometimes) take advantage of separate similar files or older versions when changes are partly or fully reverted. Git than applies deflate compression to both full texts and deltas (Subversion does not).

There are no other general-purpose methods of compressing storage of multiple version of file. When you need to keep the old versions of the files around, Git is optimal or almost so. The only disadvantage compared to dedicated backup systems is that Git can't delete old versions.

Most images are compressed and that usually mean that when there is a difference, all the rest of the file differs too, so they don't gain that much from delta-compression and being compressed don't get much from the extra compression applied by Git. However Git has a mechanism to provide "clean" and "smudge" filters. The "clean" filter is applied before storing file in the repository, "smudge" filter is applied when checking it out. In case of PNG files you could use them to rewrite the files without compression. Than if they actually contain big portions that are the same in different versions, delta compression will take advantage of them and the compression will be applied by git afterwards (uses the same algorithm), so you are not loosing anything. In practice I suspect it will only be worth the trouble if you have many images and big parts of them actually are the same. Also applies to other deflated formats like OpenOffice documents.

Jan Hudec
  • 73,652
  • 13
  • 125
  • 172
  • Thanks. OK! Now I'm confused. I have implemented my program with Git and since Ozan and [this thread](http://stackoverflow.com/questions/8198105/how-git-stores-files) said that Git doesn't use deltas I was ready to switch to SVN. Do you have any official link which can prove what you mentioned here? I really don't know which one is true. Sorry. – Alireza Noori Jan 03 '13 at 13:30
  • @AlirezaNoori: There's [one already linked in comments to the question](http://git-scm.com/book/en/Git-Internals-Packfiles)!. It explains the internal storage in rather great detail. – Jan Hudec Jan 03 '13 at 13:53
  • @AlirezaNoori: I have picked up rough idea about how the algorithm works over time, but I can't seem to find any reasonable description. You can understand a bit by looking at the [git gc](http://git-scm.com/docs/git-gc), [git pack-objects](http://git-scm.com/docs/git-pack-objects) (low-level) and [git config](http://git-scm.com/docs/git-config) (look for `pack.*` options). – Jan Hudec Jan 03 '13 at 14:08
  • Thanks. I'll look more into it. But seems that this is more accurate. – Alireza Noori Jan 03 '13 at 14:26