0

I have an approximately 8gb file which I'm attempting to download located here: www.cs.jhu.edu/~anni/ALNC/030314corpus.splittoklc.tgz

However, the server closes my connection every few seconds allowing me to only download 50-90MB of the file at my connection speeds. I've swapped ip addresses too, but get the same behavior. Does this also happen for everyone else?

Here is the output I get from wget

enter image description here

I'm wondering if I can reset my connection like wget did automatically the first few times? right now it just freezes up after a little while.

Alternatively, is there a way I can collect different parts of the file with wget or with python's requests package or some other language?


UPDATE:

I tried this on my phone and it seems to work albeit very slowly. Any ideas why this might be happening and how to solve it?

UPDATE:

The phone connection also resets eventually and since the file is so large I haven't been able to get close to completion.

Nazim Kerimbekov
  • 4,712
  • 8
  • 34
  • 58
Joe B
  • 912
  • 2
  • 15
  • 36
  • Please consider using text instead of the screenshot next time to make the question more accessible to screen readers. You can easily copy the terminal output into the question. – snwflk May 20 '19 at 19:30
  • Furthermore, your post is somewhat fuzzy in that it has a number of questions (some added later). If somehow possible, limit your questions to a central aspect. You can of course post several questions. The idea behind this is to create concise artifacts that will be useful for others. – snwflk May 20 '19 at 19:53
  • @snwflk Thanks, I'll keep these in mind for the future. – Joe B May 20 '19 at 19:56

1 Answers1

1

Preliminaries

For any of this to work, the server needs to support range requests, which it will respond to with 206 Partial Content. Judging from your terminal output, the server in question seems to have support.

Your questions

However, the server closes my connection every few seconds allowing me to only download 50-90MB of the file at my connection speeds. I've swapped ip addresses too, but get the same behavior. Does this also happen for everyone else?

No, the download works without major problems for me. I tested with

curl www.cs.jhu.edu/~anni/ALNC/030314corpus.splittoklc.tgz > /dev/null

I'm wondering if I can reset my connection like wget did automatically the first few times?

wget seems to have automatically retried the download. From the terminal output you've included, it appears as if wget would eventually "get there". You can make wget continue downloading an incomplete download using wget --continue [URL].

Alternatively, is there a way I can collect different parts of the file with wget or with python's requests package or some other language?

Starting from wget 1.16, you can use wget --start-pos 500 [URL] to start a download from a given position.

You could also use curl -r 500-1000 [URL] to download bytes in the given range.

For Python's requests module, as per this SO answer:

import requests

headers = {"Range": "bytes=0-100"}
r = requests.get("https://example.com/link", headers=headers)

Keywords for further information

The keywords here for your further search should be "range request", "partial download", "206".

snwflk
  • 3,341
  • 4
  • 25
  • 37