It's a speed issue. The server at corpus-db.org will DISCONNECT YOU if you take longer than 35 seconds to download something, regardless of how much you've already downloaded.
To make matters worse, the server does not support Content-Range, so you can't download it in chunks and simply resume download where you left off.
To make matters even worse, not only is Content-Range not supported, but it's SILENTLY IGNORED, which means it seems to work, until you actually inspect what you've downloaded.
If you need to download that page from a slower connection, I recommend renting a cheap VPS, and set it up as a mirror of whatever you need to download, and download from your mirror instead. Your mirror does not need to have the 35-second-limit.
For example, this vps1 costs $1.25/month has a 1Gbps connection, and would be able to download that page. Rent one of those, install nginx on it, wget it in nginx's www folder, and download it from your mirror, and you'll have 300 seconds to download it (nginx default timeout) instead of 35 seconds. If 300 seconds is not enough, you can even change the timeout to whatever you want.
Or you could even get fancy and set up a caching proxy compatible with curl's --proxy
, parameter so your command could become
curl --proxy=http://yourserver http://corpus-db.org/api/author/Dickens,%20Charles/fulltext
If someone is interested in an example implementation of this, let me know.
You can't download that page with a 4mbit connection because the server will kick you before the download is complete (after 35 seconds), but if you download it with a 1000mbit connection, you'll be able to download the entire file before the timeout kicks in.
(My home internet connection is 4mbit, and I can't download it from home, but I tried downloading it from a server with a 1000mbit connection, and that works fine.)
1PS: I'm not associated with ramnode in any way, except that I'm a (prior) happy customer of them, and I recommend them to anyone looking for cheap reliable VPSs.