2

I need to VPN in to our Git server to pull changes and the VPN connection is quite slow (~200kbps). I'm trying to pull a few months worth of changes, but it's 3GB of files and the VPN connection keeps disconnecting before it finishes fetching all the changes.

I'm wondering if there's a way to only pull half the changes at a time so that I could split it into 2 batches?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Adam L.
  • 91
  • 5

1 Answers1

4

The key to splitting up a big fetch is that fetch brings in commits. One fetch operation either succeeds all the way, or fails entirely if the network connection flakes out in the middle. But, if your git fetch wants to bring in, say, 16384 commits, which will bring 3 GB worth of data, which won't make it all at once, you can break this up:

  • First, bring in 8192 commits that bring in 1.5 GB of data;
  • then bring in the remaining 8192 commits that bring in the other 1.5 GB of data.

If that's not small enough, continue breaking up the commits into smaller and smaller sets of commits.

There's one major flaw with this plan, though. If 16383 of the commits bring in, say, 500 MiB of files, then one of those commits—the 16384th—brings in 2.5 GiB of files. You can't break that one up.

Also, you might not be able to pick commits this way anyway, as many servers won't let you run git fetch by raw hash ID. Two! There are two major flaws with this plan... insert Monty Python Spanish Inquisition sketch here.

Seriously, if you have the right kind of access, you can have someone place branch names or tag names against various commits, and break up the large batch of commits this way. That gets you down to the one possible major flaw.

Edit: As jthill notes in a comment, you can also work this from the opposite direction: run git fetch with a --depth option (--depth=1 tries to get just the last commit at each branch name, --depth=2 tries to get the last two, etc). Then you can run additional fetch operations with --deepen, and once you have enough, git fetch --unshallow to get everything else. This is probably the easiest to work from your end alone.

Alternatively, have someone run git bundle and make a bundle file. Then, use some restartable transfer protocol to send the file over. Once you have the whole file, run git fetch against the bundle file. A bundle file simply splits git fetch into its various separate parts:

  • aggregating the objects that are required for transfer (git bundle does this part);
  • transferring the bundle file (you do this part yourself); and
  • extracting the bundle file into commits (git fetch knows how to do this).

There are a bunch of questions and answers on StackOverflow about git bundle; see, e.g., How to use git-bundle for keeping development in sync?

torek
  • 448,244
  • 59
  • 642
  • 775
  • Oh that git bundle sounds perfect for the future (this whole thing is a pain every time we add a new team member and they have to pull the file from scratch). But for that first solution, I'd be willing to take my chances since I do get very close (like 80-90%) before timing out... so how would I split it like you mentioned? – Adam L. Dec 18 '20 at 19:19
  • 1
    @AdamL.: well, the tricky part is figuring out what commits `git fetch` is fetching, and then placing a branch or tag name in the *other* Git such that you can `git fetch origin ` to get the "first half" or "first 10%" or whatever. There's no perfect way to do that; your best bet is to note the hash IDs of the remote-tracking names in *your* Git, and log on to the other machine (or otherwise gain access to it) and run `git log` on the Git repository there and see which commits they have, that you don't. – torek Dec 18 '20 at 20:32
  • @torek can I just do a fetch by the hash ID? – Adam L. Dec 18 '20 at 20:35
  • 2
    You can try `--depth=1` fetches and repeated `--deepen=` fetches... – jthill Dec 18 '20 at 23:17
  • @AdamL : try it and you'll see ! :) It depends on the server. – LeGEC Dec 18 '20 at 23:19
  • @jthill: oh, right, I totally forgot about working it from the other end! I should edit that into my answer. (and @ AdamL: you can *try*, as LeGEC said, it's up to the server whether it allows this) – torek Dec 19 '20 at 00:22