How do you manage large git repositories?

Question

One of our git repositories is large enough that a git-clone takes an annoying amount of time (more than a few minutes). The .git directory is ~800M. Cloning always happens on a 100Mbps lan over ssh. Even cloning over ssh to localhost takes more than a few minutes.

Yes, we store data and binary blobs in the repository.

Short of moving those out, is there another way of making it faster?

Even if moving large files our were an option, how could we do it without major interruption rewriting everyone's history?

Have you compressed? Is your `.gitignore` sufficiently setup? I went from 4.3gb repo to 450mb with these considerations and it saved my life, haha — Nic, Jul 28 '11 at 17:00
I would love to be in your shoes. Our git repo has recently reached about 2GB. Our main servers are a few countries away, so slower connection, so cloning takes about 30 - 45 minutes. What I'm trying to say is: if you think your situation is bad, there are always a lot of people for whom it's worse ;) . — Radu Murzea, May 19 '14 at 08:40

score 1 · Answer 1 · answered Jul 28 '11 at 19:17

1

I faced the same situation with a ~1GB repository, needing to be transferred over DSL. I went with the oft-forgotten sneakernet: putting it on a flash drive and driving it across town in my car. That isn't practical in every situation, but you really only have to do it for the initial clone. After that, the transfers are fairly reasonable.

answered Jul 28 '11 at 19:17

Karl Bielefeldt

47,314
10
60
94

And don't forget the `git-bundle` capability (for sneakernet and other manual transfer methods) so you only need to transfer the **new** bits between the different repos – Philip Oakley Jul 28 '11 at 20:52

score 0 · Answer 2 · answered Jul 28 '11 at 16:56

I'm fairly sure you're not going to be able to move those binary files out without rewriting history.

Depending on what the binaries are (maybe some pre-built libraries or whatever), you could have a little script for the developer to run post-checkout which downloads them.

score 0 · Answer 3 · answered Jul 28 '11 at 17:08

0

Gigabit... fiber... Without rewriting history, you are fairly limited.

You can try a git gc it may clean it up a bit, but I'm not sure if that is done with a clone anyway.

answered Jul 28 '11 at 17:08

Andy

44,610
13
70
69

1

If he did `git gc --aggressive` and pushed, would it update the remote? Might be basis for a new question – Nic Jul 28 '11 at 17:19
1

I was wondering this the other day... I think I'll scribble up a question. – Andy Jul 28 '11 at 20:05

score 0 · Answer 4 · edited May 23 '17 at 12:11

0

Even if moving large files our were an option, how could we do it without major interruption rewriting everyone's history?

Check this answer: Will git-rm --cached delete another user's working tree files when they pull

This measure, together with adding patterns to .gitignore, should help you keep those big files out.

edited May 23 '17 at 12:11

Community

1
1

answered Jul 28 '11 at 17:33

Niloct

9,491
3
44
57

How do you manage large git repositories?

4 Answers4