git clone huge repository

Question

I have a relatively big git repository on Team Foundation Server - almost 3GB due to - but after a certain amount of time server resets the connection around 1GB - too slow download speed with 120-100 kbit/s

I was trying to clone it using downloading via zip and then add remote and do pull. But it considers all the files as new.

How to clone such repository?

P.s. I know that it should not be that big and the files there should not be stored in git repo at all, but I don't have the option to move them.

Is there any way you can get the .git folder from someone else's clone, or download from the server directly? You can always try a shallow clone (--depth=1) and then fetch more extra history once you have that, but if the bulk of the data is in the shallow clone that might not help. — Rup, Jan 14 '20 at 11:42
Ideally git would attempt to recover objects from a partially-downloaded pack, so you don't have to start from scratch next time. If you're really stuck you might be able to rig that up too, but I've never found time to investigate. — Rup, Jan 14 '20 at 11:44
"But it considers all the files as new." - actually you might be able to fix that up using grafted commits: graft the commit you've generated from the download to the real head commit ID, and then any changes you commit on top of that will look correct from the server's point of view. — Rup, Jan 14 '20 at 11:49
@Rup oh, I did not know that git partially clones the repository. but in my case when it stops downloading it removes the folder with already cloned data for some reason. — lapots, Jan 14 '20 at 12:33
Yes, that's what I meant. It ought to do something smarter, or at least offer to. — Rup, Jan 14 '20 at 12:34
@Rup how to perform shallow cloning multiple times? because I literally have a single folder with 2GB weight with the biggest files. — lapots, Jan 14 '20 at 12:38
There's [git clone --filter](https://unix.stackexchange.com/a/468182/2913) if that would help? You can ignore that folder if you don't need it. You may then be able to fetch the files afterwards but I'm not sure exactly how sorry. — Rup, Jan 14 '20 at 13:02

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

If you can update to Git 2.25 you can use the new sparse-checkout command to fetch only part of the repo. You may be able to fetch enough data with a single sparse-checkout that you could then fetch the remaining data without a failure.

A sparse checkout is nothing more than a list of file path patterns that Git should attempt to populate in your working copy when checking out the contents of your repository. Effectively, it works like a .gitignore, except it acts on the contents of your working copy, rather than on your index.

[...]

The idea behind the git sparse-checkout command is simple: allow users to play with partial clones and sparse-checkouts as easily as possible. It can do four things: set the list of paths to checkout, print the current list, and enable or disable sparse checkouts entirely.

git clone huge repository

1 Answers1