2

I'm cloning the Linux kernel repository. The repository is so huge and my network is so slow that I can't clone it all at once. That may keep my computer on for a whole week.

If I stop the cloning mid-operation, progress would be lost. How can I partially clone a git repository?

jthill
  • 55,082
  • 5
  • 77
  • 137
dspjm
  • 5,473
  • 6
  • 41
  • 62
  • 1
    What does "separately" mean? What, separate from what? – djechlin May 24 '13 at 13:46
  • I think this is a dupe - http://stackoverflow.com/questions/9268378/how-do-i-clone-a-large-git-repository-on-an-unreliable-connection – djechlin May 24 '13 at 13:47
  • git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git is the full tree with all previous versions. git://github.com/torvalds/linux.git seems to be only the latest version only; far less data. Maybe a version of the bare repository is available via *bittorrent*, etc. – artless noise May 24 '13 at 14:38
  • @artlessnoise, `git://github.com/torvalds/linux.git` is a complete tree with all the history – Shahbaz May 24 '13 at 14:59
  • @Shahbaz When I cloned *linux-stable.git*, I have `git branch -a` giving 42 branches. With *github*, I only have 3; sorry, I guess my comment was ambiguous? It could be that I have called `clone` differently, but I thought the repositories were different. They both have *complete* history, depending on your definition. – artless noise May 24 '13 at 15:37
  • @artlessnoise, yes that's actually because `linux-stable.git` maintained in kernel.org has a little more data than on github. While on github you have the full linear progress of the code, on kernel.org you also have the backports. Between each version of the linux kernel there are tens of thousands of commits. Backports are usually just up to ~20-30 commits. So in the end the clone on kernel.org is perhaps just 0.1% larger (or something) than the one on github. – Shahbaz May 24 '13 at 15:58
  • @Shahbaz The network must be faster for me. It seemed to download far faster. – artless noise May 24 '13 at 16:04
  • @artlessnoise, only way to know for sure is to check the directory sizes after download :) – Shahbaz May 25 '13 at 13:29

4 Answers4

4

Cloning cannot be resumed, if it's interrupted you'd need to start over. There can be a couple of workaround though:

You can use shallow clone i.e. git clone --depth=1, then you can deepen this repository using git fetch --depth=N, with increasing N. But disclaimer is, I have never tried myself.

Another option could be git-bundle. The bundle itself is a single file, which you can download via HTTP or FTP with resume support (via BitTorrent, rsync or using any download manager). You can have somebody to create a bundle for you and then download it and create a clone from that. Correct the configuration and next of fetch from the original repo.

intellidiot
  • 11,108
  • 4
  • 34
  • 41
2

I'm not sure what you mean by separately but git clone is going to clone the whole repo as there is no way to clone just some part of a repo.

But you can do a shallow clone with just a depth of one commit and/or only one branch

git clone --depth=1 --single-branch --branch master

That will just grab the last commit of the master branch.

cexbrayat
  • 17,772
  • 4
  • 27
  • 22
0

git clone --depth 100 will only grab the last 100 commits.

In general it looks like what you actually want is unsupported:

All kind of say "this doesn't exist yet."

But some large repos also host "dumb" http ways to retrieve the repo (not git-layer clone) to solve this problem. Linux kernel may.

Community
  • 1
  • 1
djechlin
  • 59,258
  • 35
  • 162
  • 290
  • I have read that shallow clone can't fetch, push from nor into it. What I get is it's useless except you get a snapshot and a few revisions. I wonder if we can deepen the commits continuously until it's not shallow any more? Also, using wget is not a good idea, in my experience, some how you have to download it all over again because you can never determine which files you have downloaded. – dspjm May 24 '13 at 14:58
  • @dspjm, I had a discussion on this with some other guys some time ago. Apparently, the documentation is out of date and in fact you **can** fetch or push from or into a shallow clone. You can try it, but the result of the discussion in the past was that it works and it's just the documentation that is out of date. – Shahbaz May 24 '13 at 15:01
0

A clone is actually a series of smaller steps. In a nutshell, it first downloads a list of references, then it retrieves the pack file or loose object file for each of those references. There's currently no way to resume an interrupted clone automatically, because a clone usually sends one big pack file, but with some work and research you should be able to request smaller packs manually, the same as someone who does a series of pulls over time.

Look at the git book chapter on transfer protocols and the git fetch-pack command for more information. Also, git's source code is available on github, so you may be able to add a resume option yourself, or at least use it to get an idea of how clones are done internally.

Karl Bielefeldt
  • 47,314
  • 10
  • 60
  • 94
  • I might try the git source a bit later. Git is a really dauntingly complex system, I thought. However, according to your statement, should we say that we can git clone --depth and then use git fetch-pack --all to get the rest of the git repo? – dspjm May 24 '13 at 18:37