11

I have been learning Git the past weeks or so and I really like the way it works in comparison to SVN. The main reason I am looking to fully switch to it is the fact that merging is supposedly a lot easier with few conflicts and the fact I can commit locally. This promotes the use of many branches (like a branch per ticket/issue/task/etc..) and also promote many commits. I only use branches if I need to in SVN (since merges often produces conflicts) and I only commit when I am 100% sure the issue is fix (instead of incremental commits, which would be nicer).

Now, one concern I have about git as I have been reading it is about non text files/large projects. For example I am working on a game project currently controlled in SVN. Now with a game project, there are going to be a lot of non text files like art, sound, and other binary files and some of the files can get pretty big. How well does git handle non text file / large binary files? What are some of the considerations I have to keep in mind if I want to port over such a project to git?

Null
  • 1,950
  • 9
  • 30
  • 33
ryanzec
  • 27,284
  • 38
  • 112
  • 169
  • Have you considered using Artifactory or similar to store (versioned) large files? http://www.jfrog.com/artifactory – Nic Oct 19 '17 at 01:30

4 Answers4

6

One of the big differences in how Git stores data compared to other version control systems is that Git stores the file content completely as a single object. That means that every version of every file exists as a complete file in your repository (it's very compressed though). So while other VCS store the differences/deltas between two versions, and as such handles binary and text files differently (as binary files are not that diff-able), Git just handles all of them identical.

As such, working with binary files in Git is not different to using any other file type. You just need to keep in mind that versioning very large files is going to increase your repository size a lot (as every single version of that large file is stored as it is, even if the actual, binary change was small). Git's compression however works wonders and makes you not notice this usually. Especially if you are only talking about a program's assets, you probably won't have any difficulties.

poke
  • 369,085
  • 72
  • 557
  • 602
  • 9
    Each object is an individual file only initially. After enough commits, or when the repo is `gc`’d, or when you clone it, they will be deltified and packed, making the repository’s size competitive with a subversion repository. – Josh Lee Apr 24 '11 at 17:48
  • @jleedev: What I meant with “exists as a complete file“ is that the file (content) is stored completely, not that there is necessarily a single file that is storing the blob. As I said, the compression Git performs is very effective so that you usually don't notice that each file version is stored independently in the repository. – poke Apr 24 '11 at 17:56
  • It just depends on your level of abstraction. But it’s true that you usually don’t notice it. – Josh Lee Apr 24 '11 at 17:57
  • 2
    compressing already compressed data doesn't usually yield good results. there's nothing magical about git that will further compress a jpg. Making a delta is the only way to reduce the size - and once delta-fied, its stored as a delta (obviously) making it no different from any other SCM. – gbjbaanb Jun 03 '11 at 14:51
1

Adding @poke's answer

I am an avid Git user these days, but having worked in a huge project where there were lots of binary files - mostly zips - to be handled - I found SVN to be more efficient than Git. The size of the Git repo got bloated up in no time while the size of a similar SVN repository did not vary much. Cloning such a huge Git repo, especially across geographically distributed places was a nightmare. Git also doesn't have a partial clone feature, something that we do in SVN all the time - checkout just a particular folder. There is partial checkout in git, but you still have to clone the entire repo.

Note that whether or not a file is binary does not affect the amount of repository space used to store changes to that file, nor does it affect the amount of traffic between client and server. For storage and transmission purposes, Subversion uses a diffing method that works equally well on binary and text files; this is completely unrelated to the diffing method used by the 'svn diff' command.

http://subversion.apache.org/faq.html#binary-files

Given SVN's mature sys admin tools ( Git also has improved over the years, but I feel SVN still has the edge in this aspect ) I think it will be wise to have a SVN server with probably git-svn repo for local development.

There is something call git-bigfiles - which is a fork of git. Not sure how mature it is. You can evaluvate it. But the fact that it exists, shows Git is not necessarily good at handling large files.

manojlds
  • 290,304
  • 63
  • 469
  • 417
  • 1
    First, zips are itself already compressed, so the gzip compression git applies of course won't work that much. Second you don't necessarily have to clone a repository. In fact cloning is something you only do at the beginning to get your local repository set up. You can however simply add other repositories as remotes and then do partial fetches of branches etc.. – poke Apr 24 '11 at 18:01
  • Regardless of that, unless you have constantly changing binary files (for example when your editor uses a binary format, e.g. the Flash IDE's .fla files), using Git for storing binary *assets* (sounds, images) is not a problem at all, and especially the small drawbacks won't outweight the benefits you get from using Git in the first place. – poke Apr 24 '11 at 18:03
  • 1
    I don't think you understand my concern about cloning. When I have to setup a new local repo, I have to clone, irrespective of a clone being a one-time work. And fetching particular branch is still not partial from the SVN world of things. And in any large project projects, binary files are going to get changed very frequently. Including the assets. Alteast a subset of them, which is enough to start giving the problems. – manojlds Apr 24 '11 at 18:08
  • 1
    `First, zips are itself already compressed, so the gzip compression git applies of course won't work that much.` - exactly, that was not the case with SVN and that was my point. – manojlds Apr 24 '11 at 18:10
0

git handles perfectly the binary files. You just have to keep in mind that all the versions of the binary file are kept locally. If a binary file (let's say an image) changes frequently, you will end up filling up your local space with all the version of the image.

ALoR
  • 4,904
  • 2
  • 23
  • 25
  • It is my experience that git does a good job of compressing binaries. Just remember to `git pack` or `git gc` once in a while – sehe Apr 24 '11 at 16:24
  • Does SVN handle binaries differently or the same as mentioned here (storing a full copy of the file for each version)? – ryanzec Apr 24 '11 at 16:26
  • since with git you have to full history in your local repo, it will consume more space than a svn project. with svn if you want to go back in time you need the remote repo to download it from (the server has all the history). with git you already have all the version locally. – ALoR Apr 24 '11 at 16:32
  • @ryanzec - see my answer for how SVN handles binary files and their diffs – manojlds Apr 24 '11 at 19:16
0

Other answers have addressed the choices here, but there is also the possibility of using SVN for the binary files (if they will change a lot), and git for everything else. During the build phase, you can use scripts to fetch the binary resource from svn.