Since Git is cumulative, how to deal with that for long term when the repo becomes 5GB?

Question

For Git, since all the history is there, I am wondering, saying if I keep a repo for keeping all code I have written, over the 5 years or 10 years, with all revision history, then the repo becomes 5GB.

And if a machine doesn't have a repo, and I want to just try a code snippet or a small Rails project, I have to clone the whole 5GB over, and that won't be too practical.

Say, if out of the 5GB, only 200MB is the current files, and all the other are history, then at least if using SVN, then each machine will have the 200MB, instead of 5GB. Maybe Git is very suited for each self-contained small or medium projects, but what if it is a "long term my whole life repo", then how to use Git for it?

Why would you put unrelated projects in the same repo? Besides, in 5 or 10 years, 5GB won't seem nearly as big as it does today :-) — timdev, Apr 04 '11 at 03:19
unrelated, because it is "all my code ever written", so I want to keep it in 1 repo. Mercurial has subrepo... so it is possible to have 1 big repo that have 300 subrepos (and maybe 1 repo having 10 subrepos, and each subrepos have 20 to 30 subrepos... this I am not sure yet) Then it can clone or push / pull subrepo or the "top repo" — nonopolarity, Apr 04 '11 at 03:20
5GB! Oh my god, I have my code repo from 1997, at the very start it's CVS, and later I converted it to Subversion, and now 2 years ago I converted it to Git, all the history are preserved, I guess I have nearly 500 projects cover 10 languages, but it's no more then 300M ! — Lenik, Apr 04 '11 at 03:28
Uh, git has [subrepos](http://www.kernel.org/pub/software/scm/git/docs/git-submodule.html) too. — Karl Bielefeldt, Apr 04 '11 at 03:30
As others have said in answers, that's not how git is designed. But you could easily have a plain old directory full of repos. If you wanted to move them all somewhere, you could just copy them using your whatever filesystem tools you prefer. Besides, what will you do when I want to buy "program X, including the full version control history" for a billion spacebucks? What about if you want to let me help you finish a project, but don't want to share every line of code you've ever written with me? — timdev, Apr 04 '11 at 03:31
You've managed to type 5GB of code in only 10 years? Congratulations, you're much better than I am! That's about 190 WPM, 24/7. — Mark Ransom, Apr 04 '11 at 03:31
BTW, I always exclude generated files from repo, such as `configure` and `yacc.c`. The clean copy of current checkout (exclude .git/) is around 130M. I like to put all projects in one repo, so I can move files between projects without losing history. — Lenik, Apr 04 '11 at 03:36
@Mark sometimes there are generated data file which I want to keep... sometimes there are even sqlite3 database file that are 2MB but I want to keep, but 15 commits later there maybe about 30MB already — nonopolarity, Apr 04 '11 at 03:46
@Lenik, you look like you are 20 years old... so 1997... you started using CVS when you were 6 years old? — nonopolarity, Apr 04 '11 at 03:48
Keeping database *dumps* can generally be done fairly efficiently, if you are able to dump them with a consistent ordering. `mysqldump` takes the option `--order-by-primary` to do this; I'm not sure about sqlite. The advantage with ordered dumps is that when you run `git gc`, it will be able to use delta compression to just encode the new/changed/deleted lines —these being database rows— between commits. — intuited, Apr 04 '11 at 04:00
@intuited so you mean don't version the .sqlite3 files, (git ignore them), but version the dump.... that's good... i think except the extra steps of dumping and restoring (and to remember to restore) — nonopolarity, Apr 04 '11 at 04:21
@動靜能量: Yeah, that's the idea. You can use git hooks (`man githooks`) that invoke a script or even a Makefile target to streamline the process. The basic idea is that git will execute scripts in `.git/hooks` when certain actions are to take place. For example, `pre-commit` will be run when git is getting ready for a commit. I think this would the appropriate time to do a dump and add it to the index, but I'm not really sure about it.. I've always just done my dumps and restores directly, and don't have much experience using hooks. If it seems unclear, you can always ask another question :) — intuited, Apr 04 '11 at 05:51

score 6 · Answer 1 · answered Apr 04 '11 at 03:23

6

Use multiple Git repositories. A single server can handle any number of repositories.

If you want to get dirty within a repository, you can create a new branch, rewrite it's history (merging multiple commits into one), and delete the first branch.

answered Apr 04 '11 at 03:23

yfeldblum

65,165
12
129
169

score 5 · Accepted Answer · answered Apr 04 '11 at 03:25

You are correct sir, and the Git Wiki agrees with you. That being said, if you don't care about pushing/pulling changes from this hypothetical git repository, you can do a "shallow" clone to pull a commit without it's history:

git clone --depth X

Where X is how far back into the history you want to go. 1 will get you the most recent commit, 2 will pull the most recent and the one before, and so on and so fourth.

score 1 · Answer 3 · edited May 23 '17 at 11:47

1

You wouldn't use Git for that, because that's not what Git is for. ;)

Considering how quick and easy creating a (possibly local) repo is, there's not much reason not to have one for each project, and a few reasons to do so (being able to track them separately, keep repos small and on-topic, etc).

As far as data on 5GB repos, you may look at the benchmarks here and this question regarding Git's limits.

edited May 23 '17 at 11:47

Community

1
1

answered Apr 04 '11 at 03:23

ssube

47,010
7
103
140

then if I have 300 small repos... how to clone all 300 to another machine if that's what I want? (without writing a script to do it) – nonopolarity Apr 04 '11 at 03:24
Copy the folders? You might be able to keep all the little repos in a big repo (no idea if it would actually work, and it would be awfully convoluted). Storing everything ever isn't really what Git's for, though, so you can't expect it to work great. – ssube Apr 04 '11 at 03:26
You could write a shell script to find all of the git repos under a certain directory. Something like `find -name .git` will get you most of the way there. You can pipe the output to, say, `sed 's/^/gitserver:where\/my\/repos\/live\//'` and then into a bash loop that ensures that the corresponding local parent directory (relative to your repo root dir) exists and then `git clone` s the URL there. – intuited Apr 04 '11 at 04:05

Since Git is cumulative, how to deal with that for long term when the repo becomes 5GB?

3 Answers3