251

I am trying to git clone the LibreOffice codebase, but at the moment I have an internet connection of about 300kbps and it's just anything but stable. I can get the connection back any moment, but then the git clone process already stopped working, and no way to get it running again. Is there some way to have a more failure-resistant git clone download?

One option I considered myself is to download someone else's .git directory, but that is overly dependent of others and doesn't seem like the best possible solution to me.

erip
  • 16,374
  • 11
  • 66
  • 121
LaPingvino
  • 2,803
  • 2
  • 18
  • 17
  • 8
    Do you need to clone all revisions, or just latest? Maybe `depth -1` is a solution? – takeshin Oct 17 '10 at 20:32
  • 1
    The bundle approach is already in place for repos like [`kernel/git/torvalds/linux.git`](http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git). And a resumable git clone is being discussed (March 2016). See http://stackoverflow.com/a/29192890/6309. – VonC Mar 03 '16 at 14:37
  • I wonder. Won't doing `git init`, setting a remote and then doing fetch until it succeeds do the trick? I don't think fetch discards successfully downloaded objects if the connection fails. – Андрей Беньковский Nov 21 '16 at 14:23
  • @АндрейБеньковский has anyone tried this? – William Entriken Apr 16 '17 at 00:44
  • 1
    Also see [Does git-clone have resume capability?](https://superuser.com/questions/512190/does-git-clone-have-resume-capability) over on Super User and [Is there any way to continue Git clone from the point where it failed?](https://stackoverflow.com/questions/8587536/is-there-any-way-to-continue-git-clone-from-the-point-where-it-failed) here. – Anon Jun 16 '18 at 07:37
  • Microsoft contributes GVFS now, so that and maybe the buffer size option just added might be helping to actually solve this issue over time. – LaPingvino May 27 '20 at 14:35

17 Answers17

169

Two solutions (or rather workarounds) that come to mind are:

  • Use shallow clone i.e. git clone --depth=1, then deepen this clone using git fetch --depth=N, with increasing N. You can use git fetch --unshallow (since 1.8.0.3) to download all remaining revisions.

  • Ask somebody to bundle up to some tagged release (see git-bundle(1) manpage). The bundle itself is an ordinary file, which you can download any way, via HTTP/FTP with resume support, via BitTorrent, via rsync, etc. Then you can create clone from bundle, fix configuration, and do further fetches from official LibreOffice repository.

Benjamin Loison
  • 3,782
  • 4
  • 16
  • 33
Jakub Narębski
  • 309,089
  • 65
  • 217
  • 230
  • 3
    The shallow clone trick doesn't work well in practice. Cloning a well-packed repo (git://libvirt.org/libvirt.git) changes a 68M transfer into a 61M + 35M transfer. A feature to prioritise the worktree, rather than all branches at depth 1, might fare better; session resumption would be better still. – Tobu Jan 19 '12 at 12:09
  • 2
    @Tobu: Shallow clone trick might work in repository with lonG history. There is ongoing work to make shallow clone get only a single branch by default. That might have helped. Or not. – Jakub Narębski Jan 19 '12 at 15:53
  • 9
    This works **really well** now, with git 1.7.10. The initial depth=1 clone of the Git repository is only 4.72Mb, while the whole repository is 55Mb. Further fetches can be as small as you want, (depth=100 gave me a ~20Mb fetche). The total compressed download was 31Mb, over one clone and 3 fetches. – naught101 Mar 26 '13 at 09:01
  • 2
    @naught101 It downloads objects for one revision, and if source code itself is large (not history), then it will be an issue again... – kan Mar 28 '13 at 13:06
  • Deepen with increasing N: https://en.wikipedia.org/wiki/Iterative_deepening_depth-first_search – Kaz Dec 16 '17 at 03:27
  • 7
    `for m in $(seq 1 50);do git fetch --depth=$[m*100];done` worked for me, thanks! :) – Trass3r Jan 15 '19 at 11:41
  • 1
    If using windows command line, the above loop can be `FOR /L %%m IN (Lowerlimit, Increment, Upperlimit) Do git fetch --depth=%%m` – Naman Bakhru May 29 '21 at 12:09
  • I encountered a problem after using this: after `--unshallow`, my remote tracking branches still only included the main branch. See: https://stackoverflow.com/a/46282491 – NeatNit May 31 '23 at 21:01
  • A powershell equivalent oneliner: `1..50 | ForEach-Object { git fetch --depth=$($_*100) }` – Elia Grady Jul 05 '23 at 12:04
79

I don't think this is ready yet. There's an old GSoC page that which planned to implement your desired feature. My best bet is, like you suggested download it as a directory. I'm assuming you are able to resume downloads over other protocols.

Restartable Clone

When cloning a large repository (such as KDE, Open Office, Linux kernel) there is currently no way to restart an interrupted clone. It may take considerable time for a user on the end of a small pipe to download the data, and if the clone is interrupted in the middle the user currently needs to start over from the beginning and try again. For some users this may make it impossible to clone a large repository.

Goal: Allow git-clone to automatically resume a previously failed download over the native git:// protocol. Language: C Mentor: Shawn Pearce Suggested by: Shawn Pearce on gmane


Update

Along with the shallow cloning (git clone --depth=1) suggestion in one of the other answers it may be helpful if someone can make a bare repository for you if you can communicate with the provider. You can easily convert the bare repository to a full repository. Also read the comments in that answer as a shallow clone may not always help.

Community
  • 1
  • 1
Jungle Hunter
  • 7,233
  • 11
  • 42
  • 67
  • Thanks for the information, so my problem is known and a solution is worked on... What would you recommend as a work-around? – LaPingvino Oct 17 '10 at 19:31
  • I would say if you can clone it some place else, just copy is from there. Or if you can download it as a directory (the .git and other stuff that's there) then you do that. Almost all download managers will let you resume your regular downloads (the directory method). – Jungle Hunter Oct 17 '10 at 19:33
  • I know that one. The worst thing however is that it's one anonymous download over the git-protocol first, then there's a script to do 19 more git clones – LaPingvino Oct 17 '10 at 19:43
  • Oh! Get someone to clone it for you on a flash drive or something then. :P – Jungle Hunter Oct 17 '10 at 19:44
  • The problem is that all connections are crap here... I think I'll have to put it all on a server and then download it by scp... I just only have Shared Hosting ssh access, so I don't know about git on those machines... :( – LaPingvino Oct 17 '10 at 19:48
  • Maybe off-topic, but this might work as a possible implementation for a more failsave git clone: * Have an option to make this possible (like --flacky-connection) * While using this option, implement clone as just a clone of the first revision, then update in blocks with git pull. – LaPingvino Oct 18 '10 at 14:38
  • Would work if the first revision is small. Could happen that the initial revision is big enough to be painful. But, hey, it's all open-source. ;) – Jungle Hunter Oct 18 '10 at 17:35
  • I am also stucked while cloning vlc code, though its not that big but connection getting interrupted over http, no way to resume from the repo block already downloaded :( – cbinder Nov 11 '13 at 12:06
  • 16
    Well just yesterday ,I Lost my 600 rupees($10) Because of this Problem.Internet Bandwidth is quite precious thing in my Part of the world. – Amit Singh Tomar Dec 24 '13 at 14:06
  • 2
    Lots of people asking for updates and nobody sharing their contribution to the solution. – William Entriken Apr 16 '17 at 00:45
  • 2
    Mar'18 - lukin for it still...on this earth!! – earthling Mar 23 '18 at 05:11
  • 4
    11 years later, Google's attack on the underlying socioeconomic issue of unreliable bandwidth with Google Fiber and Google Fi had mixed results. Its fiber micro-trenches in the city of Louisville were cut **too shallowly** into the asphalt, and the cables were found popping out from the road surface soon after work. Meanwhile, `--depth 1` and `--unshallow` appears to have withstood the years of usage. – rwong Feb 08 '19 at 22:39
18

This method uses 3rd party server.

First, do git clone --bare, then rsync -v -P -e ssh user@host:repo.git . You can use msys under Windows.

Rafal Rusin
  • 623
  • 6
  • 12
  • I tried --bare option, it created the expected contents of .git internal files inside repo.git , I had to do the git clone file:///path/to/repo.git/ to get the actual repository – PiyusG May 25 '16 at 09:00
  • 2
    Linus [doesn't own GitHub](https://github.com/torvalds/linux/pull/17#issuecomment-5654674)…by "3rd-party server", did you actually mean “Git server which does not jail its users so heavily as to prohibit their use of `rsync(1)` _by the way GitHub I'm looking at you_”? Or, do you mean to first `git clone` _on_ a 3rd-party server and then rsync it to the local machine? – JamesTheAwesomeDude Jul 17 '18 at 18:11
16

"Never underestimate the bandwidth of a carrier pigeon and a bundle of SD cards" would be the modern form of this answer. Tar it up, plain old cp -a it, whatever, and mail the damn thing. Find someone willing to take two minutes of their time to drop a thumb drive into an SASE. Find a contact, there, they might even do it for you.

jthill
  • 55,082
  • 5
  • 77
  • 137
12

I would like to put my 5 cents here. This is actually what helped me to solve this issue

  • Turn off compression
  • Increase http.postBuffer
  • Do a partial clone
  • Navigate to the cloned directory and fetch the rest of the clone
  • Pull the rest
git config --global core.compression 0
git config --global https.postBuffer 524288000
git clone  <your_git_http_url_here> --depth 1
git fetch --unshallow 
git pull --all

This helped me to clone ~3GB repo over the 8Mbps adsl connection, of course I had to perform fetch and pulls few times, but still ...

RMPR
  • 3,368
  • 4
  • 19
  • 31
mati kepa
  • 2,543
  • 19
  • 24
11

You can "download someone else's .git directory", but with that someone else being the official repository itself. The LibreOffice repositories are available via http, for instance their build.git is at http://anongit.freedesktop.org/git/libreoffice/build.git/ (see http://cgit.freedesktop.org/libreoffice/ for the complete list, the http URL is at the bottom of each repository's page).

What you see at these http URLs is nothing more than a .git directory (actually a "bare" repository, which has only what you would find in the .git directory). It is the same directory the server for the git:// protocol (git daemon) would read. If you make a copy of these directories with a web downloader (for instance wget -m -np), you can clone from your copy and it will work as well as if you had cloned directly from the http repository.

So, what you can do is: for each repository, get a copy of it with your favorite web downloader (which will deal with all the issues with resuming broken downloads), and clone from that copy. When you want to update, use again your favorite web downloader to update your copy, and pull from that copy. Now your clones and updates are as resistant to bad connections as your favorite web downloader is.

CesarB
  • 43,947
  • 7
  • 63
  • 86
  • They made the conversion to just one repository now, trying your tip wget decides to download the site at once however... (trying again now, will probably update here later...) – LaPingvino Aug 08 '11 at 13:36
  • Your command seems to get all links on the site, which is not what is meant to happen. I resorted to write a script that seems to work here: https://gist.github.com/1307703 Anyway, thanks a lot for the initial idea! – LaPingvino Oct 23 '11 at 18:48
  • Interesting idea, I'm trying to get the ruby/ruby repo from github and I'm getting blocked by the robots.txt... any suggestions? – hanetzer Dec 02 '14 at 03:47
10

Let's break git clone down into its component parts, and use git checkout to prevent re-downloading files.

When git clone runs, the first few things it does are equivalent to

git init
git remote add origin <repo_url>
git fetch origin <branch>

If you run the above steps manually, and assuming that they completed correctly, you can now run the following as many times as necessary:

git checkout --force <branch>

Note that it will checkout all files each time it's run, but you will not have to re-download files, which may save you a ton of time.

tripleee
  • 175,061
  • 34
  • 275
  • 318
cowlinator
  • 7,195
  • 6
  • 41
  • 61
  • 2
    it doesn't work the way you describe, it will not allow to do a git reset after a broken fetch – MaikoID Nov 08 '17 at 13:07
  • As I said, once you assume that a fetch has completed successfully, you can run git reset. If your fetch is broken, then reset won't work. You need to either A) repeatedly try to fetch again until it works, or B) abandon this and try something else. – cowlinator Nov 08 '17 at 23:01
  • I did something else I it miraculous worked. I did a git pull instead of git fetch =) – MaikoID Nov 11 '17 at 01:35
  • 1
    @MaikoID I believe a git pull is just calling git fetch internally, and then merges, so the command should ned have made the difference – lucidbrot Sep 23 '18 at 10:16
  • Fetch still restarts from the beggining if it fails. it just create a new tmp file at .git/objects/pack. I saw you said fetch should complete correctly, but it doesnt differ from clone command in the end, at least to download huge projects like unreal engine. The only good thing is that I briefly felt hope xD – Aquarius Power Aug 19 '21 at 00:53
10

Increasing buffer size will help you in this problem. Just follow the steps.

  1. Open terminal or Git Bash and with cd go to the location where you wanted to clone repo.

  2. Set compression to 0

    git config --global core.compression 0
    
  3. Set postBuffer size

    git config --global http.postBuffer 1048576000
    
  4. Set maxRequestBuffer size

    git config --global http.maxRequestBuffer 100M
    
  5. Now start clone

    git clone <repo url>
    
  6. Wait till clone completes.

tripleee
  • 175,061
  • 34
  • 275
  • 318
6
git clone --depth <Number> <repository> --branch <branch name> --single-branch

This command help me (Thanks to Nicola Paolucci)

for example

git clone --depth 1 https://github.com/gokhanmoral/siyahkernel3 --branch ics  --single-branch
Ahed Eid
  • 395
  • 4
  • 17
5

If you have access to a 3rd-party server, you could clone there and then copy.

Amber
  • 507,862
  • 82
  • 626
  • 550
4

Use a git proxy, such as ngitcached or git-proxy.

Amr Mostafa
  • 23,147
  • 2
  • 29
  • 24
  • 1
    Even better: https://github.com/git-cloner/gitcache – Irfan Latif Aug 07 '20 at 11:03
  • this seems to still require we to have access to a server with an excellent connection to complete the inital cloning right? where such apps will be installed to do the initial download? I dont have it, and unreal engine is so huge... – Aquarius Power Aug 19 '21 at 01:11
3

This problem bit me too. In my case there is a work-around. It may or may not apply in your case.

I'm using a mobile phone sometimes to initiate git operations on a remote system. If my wi-fi breaks of course the session ends and git drops the whole clone operation without recovering. But since the internet connection from my remote system to the git master is solid there's no need for the clone to stop. All I need is the commonsense to detach the clone from the terminal session. This can be done by using screen/tmux or nohup/daemon. So it's a liveware malfunction in my case.

3

Same problem here - I have a really flaky internet connection with often not more than 10-15 kb/sec :-P

For me the wget way worked very well.

Go to the repository site where the green button "clone or download" is, click it and copy the link of the ZIP download option.

Then insert the link to the wget command:

wget -c -m -np https://github.com/your/repository/archive/master.zip

Works like a charm...

phuclv
  • 37,963
  • 15
  • 156
  • 475
X-File
  • 31
  • 1
  • Maybe this worked before, but right now when I try your solution and connection breaks (or I press Ctrl-C) then after rerun downloading is not continued but started from beginning, at least on LLVM repository. – Arty Aug 15 '21 at 09:43
2

Use CNTRL Z to stop the cloning. Don't close the terminal put the system/laptop in hibernation and then continue later by fg command. I was facing this same problem today while trying to clone a repo frm github. This came as a time saver for me.

Jicksy John
  • 169
  • 1
  • 2
  • 13
0

if we assume server's have good band-wide (and you have a server) another answer is to:

  1. create your own server using Server-Side Git Wrapper's
  2. clone it in your server
  3. Zip it using Server-Side Zip Archiver's
  4. download it from and with Server-Side Resume support

but this only works with very basic Web-development experience ;) and also you need git.exe in your sever

Community
  • 1
  • 1
Top-Master
  • 7,611
  • 5
  • 39
  • 71
0

The best workaround that worked for me:

I faced the same issue with a bad internet connection. So I came up with the following solution:

I created a small php file on my server to download the package as a zip file:

<?php
$url = "https://codeload.github.com/CocoaPods/Specs/zip/master";
file_put_contents("coco.zip", fopen($url, 'r'));
?>  

<a href="coco.zip">coco.zip</a>

Then download the zip file using any download manager that supports resume.

tripleee
  • 175,061
  • 34
  • 275
  • 318
Zorox
  • 2,040
  • 2
  • 21
  • 28
  • 3
    You don't need a server or PHP for this. `curl -ococo.zip https://codeload.github.com/CocoaPods/Specs/zip/master` – tripleee Dec 09 '20 at 06:16
-1

You can try to use mercurial with the hg-git extension.

If that doesn't work you can can use git fetch <commit-id> to fetch only parts of a remote git repository (you can fetch into an empty git repository, there is no need to create it with clone). But you might to correct the branch configuration (=create local and remote tracking branches) when you use this approach.

Rudi
  • 19,366
  • 3
  • 55
  • 77