I am downloading a ~300 Mb file through an ftp server periodically every 6 hours or so. Most downloads go well, but sometimes the process hangs and I need to kill and restart manually. So I want a more robust download system, preferably with the following criteria.
- Avoids timeouts or hangs as much as possible. And can deal with them if they happen
- If the download is killed, try resuming it a few times until completed (or send error message if it didn't work for any of the times tried)
For (1), I read in this question that it would be good to use python threading with keep_alive calls until all blocks have been downloaded.
def downloadFile(…):
ftp = FTP(…)
sock = ftp.transfercmd('RETR ' + filename)
def background():
f = open(…)
while True:
block = sock.recv(1024*1024)
if not block:
break
f.write(block)
sock.close()
t = threading.Thread(target=background)
t.start()
while t.is_alive():
t.join(60)
ftp.voidcmd('NOOP')
For (2), there could be a loop that checks if the file has been completely downloaded. And if not, it could restart from the point it left it as. Based on this question.
for i in range(3):
if "Check if file has been completely downloaded":
if os.path.exists(filename):
restarg = {'rest': str(os.path.getsize(filename))}
else:
restarg = {}
ftp.transfercmd("RETR " + filename, **restarg)
But how to combine (1) and (2)? Can you resume a threaded download? With many blocks which we don't even know in which order were downloaded..
If these two methods cannot be combined, do you have any other idea?
Also, I am not very sure how to tell if the ftp download was completed. Should I check the file size for this? File sizes might change from one download to another.