6

One of our git repositories is large enough that a git-clone takes an annoying amount of time (more than a few minutes). The .git directory is ~800M. Cloning always happens on a 100Mbps lan over ssh. Even cloning over ssh to localhost takes more than a few minutes.

Yes, we store data and binary blobs in the repository.

Short of moving those out, is there another way of making it faster?

Even if moving large files our were an option, how could we do it without major interruption rewriting everyone's history?

Dale Forester
  • 18,145
  • 10
  • 27
  • 27
  • 2
    Have you compressed? Is your `.gitignore` sufficiently setup? I went from 4.3gb repo to 450mb with these considerations and it saved my life, haha – Nic Jul 28 '11 at 17:00
  • @melee sadly the majority is already compressed – Dale Forester Jul 28 '11 at 17:10
  • I would love to be in your shoes. Our git repo has recently reached about 2GB. Our main servers are a few countries away, so slower connection, so cloning takes about 30 - 45 minutes. What I'm trying to say is: if you think your situation is bad, there are always a lot of people for whom it's worse ;) . – Radu Murzea May 19 '14 at 08:40

4 Answers4

1

I faced the same situation with a ~1GB repository, needing to be transferred over DSL. I went with the oft-forgotten sneakernet: putting it on a flash drive and driving it across town in my car. That isn't practical in every situation, but you really only have to do it for the initial clone. After that, the transfers are fairly reasonable.

Karl Bielefeldt
  • 47,314
  • 10
  • 60
  • 94
  • And don't forget the `git-bundle` capability (for sneakernet and other manual transfer methods) so you only need to transfer the **new** bits between the different repos – Philip Oakley Jul 28 '11 at 20:52
0

I'm fairly sure you're not going to be able to move those binary files out without rewriting history.

Depending on what the binaries are (maybe some pre-built libraries or whatever), you could have a little script for the developer to run post-checkout which downloads them.

Sinjo
  • 430
  • 3
  • 7
0

Gigabit... fiber... Without rewriting history, you are fairly limited.

You can try a git gc it may clean it up a bit, but I'm not sure if that is done with a clone anyway.

Andy
  • 44,610
  • 13
  • 70
  • 69
  • 1
    If he did `git gc --aggressive` and pushed, would it update the remote? Might be basis for a new question – Nic Jul 28 '11 at 17:19
  • 1
    I was wondering this the other day... I think I'll scribble up a question. – Andy Jul 28 '11 at 20:05
0

Even if moving large files our were an option, how could we do it without major interruption rewriting everyone's history?

Check this answer: Will git-rm --cached delete another user's working tree files when they pull

This measure, together with adding patterns to .gitignore, should help you keep those big files out.

Community
  • 1
  • 1
Niloct
  • 9,491
  • 3
  • 44
  • 57