How do I clone a git repo that has become too large?

Question

I am working with a git repo that is very large ( > 10gb ). The repo itself has many large binary files, with many versions of each ( > 100mb ). The reasons for this are beyond the scope of this question.

Currently, it is no longer possible to properly clone from the repo, as the server itself will run out of memory (it has 12gb) and send a fail code. I would paste it here, but it takes well over an hour to get to the point of failure.

Are there any methods by which I can make a clone succeed? Even one which grabs a partial copy of the repo? Or a way I can clone in bite sized chunks that won't make the server choke?

Git is supposed to be used to store source code. I can't think of a program whose code would take that much space, so maybe you're storing a lot of non-source code which shouldn't be there? You could download each file separately from the server normally, then redesign your codebase - leaving only code in git while moving all the media and other things to some separate repository solution. — Geeky Guy, Sep 17 '13 at 13:16
To be clear, I am working for a quick fix to allow me to clone the repo right now; repo fixing is in the process but is beyond the scope of this question. — Charles Randall, Sep 17 '13 at 13:18

score 9 · Answer 1 · edited May 23 '17 at 12:32

One answer to 'How do I clone a git repo that has become too large?' is 'Reduce it's size, removing the Big Blobs'.

(I must conceed that the asker clarifies in a comment that repo-fixing is 'beyond the scope of this question', however the comment also says 'I am working for a quick fix to allow me to clone the repo right now', so I'm posting this answer because a) it's possible they're not aware of The BFG and so overestimate the difficult of cleaning a repo, and b) it is indeed, very freakin' quick.

To clean the repo easily and quickly, use The BFG:

$ java -jar bfg.jar  --strip-blobs-bigger-than 100M  my-repo.git

Any old files over 100MB in size (that aren't in your latest commit) will be removed from your Git repository's history. You can then use git gc to clean away the dead data:

$ git gc --prune=now --aggressive

Once this is done, your repo will be much smaller and should clone without problems.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

Wow -- just watched your [BFG presentation on Parley's](https://twitter.com/Parleys/status/517319848331083776). Awesome contribution to the git community! — AmigoNico, Jul 02 '15 at 05:56

score 8 · Answer 2 · answered Sep 17 '13 at 13:19

8

You can try passing --depth option to git clone. Or you can copy it using rsync or some such?

answered Sep 17 '13 at 13:19

Michael Krelin - hacker

138,757
24
193
173

Looking at options for --depth, is there a way of using it iteratively that would give me a complete repo? I do need for it to result in a proper working copy. – Charles Randall Sep 17 '13 at 13:24
1

I'm afraid not, I was thinking of this as a quick fix to get the tree. If you want the whole thing, I'd go for rsync. – Michael Krelin - hacker Sep 17 '13 at 13:27

score 6 · Accepted Answer · answered Sep 17 '13 at 13:28

6

Use rsync to copy the entire repo by pointing it at the top level directory that contains .git. Then change the remotes in .git/config to point back to the original.

That's the only key off the top of my head that needs to be changed in .git/config, but I would scan through looking for any others that are host specific. Most of them are pretty self-explanatory.

answered Sep 17 '13 at 13:28

masonk

9,176
2
47
58

It looks like git stopped supporting the rsync protocol in 2016: https://github.com/git/git/commit/0d0bac67ce3b3f2301702573f6acc100798d7edd. – raph Dec 04 '19 at 19:36
This method doesn't rely on git-over-rsync, which is a git transfer protocol. Here, I'm just using rsync as a file transfer method, agnostic of the fact that the files I'm transferring are a git repo. `scp` would work equally as well – masonk Dec 05 '19 at 01:28
Well, I just think it's worth pointing out because the question was around how to clone the repo, which assumes git transfer. – raph Dec 05 '19 at 15:40

score 2 · Answer 4 · answered Sep 17 '13 at 20:19

2

Try reconfiguring the pack creation parameters on the serving repo, especially git's ~no limit~ default for pack.windowmemory.

I'd start with

git config pack.windowmemory 1g

because it'll use that much per core by default.

answered Sep 17 '13 at 20:19

jthill

55,082
5
77
137

score 1 · Answer 5 · edited May 23 '17 at 12:22

1

If you have physical access or shell access to the server, you can transfer the repo manually via external hard drive or FTP. If the repo is bare, see How Do I Convert a Bare Git Repository Into a Normal One in Place.

edited May 23 '17 at 12:22

Community

1
1

answered Sep 17 '13 at 13:29

Max

21,123
5
49
71

How do I clone a git repo that has become too large?

5 Answers5