7

In our organization where we are trying to introduce Git, we have now a problem related to Git behavior with respect to binary files.
Our projects will have a good mix of binary and text type files and a typical size could be 1 GB. Our fear is that after few years a full clone would become too big and cause performance and disk space issues.
One of the environment that would migrate to Git have their SW currently on a system called TCM. The total size of repositories with versions of 7-10 years is 2 TB.
Another environment on ClearCase has around 7-8 years data of around 1 TB.
With Git not storing in deltas which will particularly affect binary files, a situation post 5+ years is causing concern to our users.
Shallow clone feature would have been ideal. But the docu says this "A shallow repository has a number of limitations (you cannot clone or fetch from it, nor push from nor into it), but is adequate if you are only interested in the recent history of a large project with a long history, and would want to send in fixes as patches.". A cursory check on shallow clones would show that it works fine, but definitely there are known use-cases where it wont work, hence document
Is there a known list of use-cases where this wont work?

maxmelbin
  • 2,045
  • 3
  • 21
  • 29
  • 3
    Update: Since git V1.9, most limitations of shallow clones have been resolved. – sleske Jun 01 '14 at 18:12
  • Git 2.5 (Q2 2015) supports a single fetch commit! I have edited my answer below, now referencing "[Pull a specific commit from a remote git repository](http://stackoverflow.com/a/30701724/6309)". – VonC Jun 08 '15 at 05:34
  • VTC as unclear. The docs give a full summary of what it can't do. A "list of use-cases" is whatever you can imagine that uses those operations. Which is an infinite set, thus impossible to put into an answer. – ivan_pozdeev Dec 25 '17 at 17:47

2 Answers2

5

I would urge you to store binary files in a dedicated repository, easy to scale and easy to clean up: an artifact repo like Nexus.
You have other alternatives in "How to handle a large git repository?".

Trying to keep everything in Git, using it in some unnatural way, will always result in more trouble that is is worth: it is a source control tool. You might as well use it for what it is good for.

That being said, a shallow clone doesn't support push (or, at least, it is dangerous: see "Why can't I push from a shallow clone?").
For read-only purpose, a simple git archive would be enough, as mentioned in "not understanding git shallow clone".

Updates 2015:

Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
1

Git Annex solves the "big-binary-files in/near git" problem quite beautifully, as well.

Andreas Klöckner
  • 1,086
  • 8
  • 11