I'm trying to download some decently large files in python 2.7 (between 300 and 700 MB each), and I'm running into the problem of the connection getting reset in the middle of retrieving the files. Specifically, I was using urllib.urlretrieve(url, file_name)
, and every so often I get socket.error: [Errno 104] Connection reset by peer
.
Now, I'm very unfamiliar with how sockets and web protocol works, so I tried the following, not really knowing if it would help:
response = urllib.urlopen(url)
CHUNK_SIZE = 16 * 1024
with open(file_name, 'wb') as f:
for chunk in iter(lambda: response.read(CHUNK_SIZE), ''):
f.write(chunk)
Edit: Guess I should credit the author of this code: https://stackoverflow.com/a/1517728/3002473
It sounds reasonable that we're only downloading a little bit at a time, so it should be "less susceptible" to this Errno 104, but again I know basically nothing about how all of this works so I don't know if this actually makes a difference.
After testing a bit it seems like it works slightly better? But that might just be coincidence. Generally, I'm able to download one, maybe two files before this error gets thrown.
Why am I getting Errno 104, and how can I go about preventing this? Out of curiosity, should I be using urllib2
instead of urllib
?