I'm currently using ftplib
in Python to get some files and write them to S3.
The approach I'm using is to use with open
as shown below:
with open('file-name', 'wb') as fp:
ftp.retrbinary('filename', fp.write)
to download files from FTP server and save them in a temporary folder, then upload them to S3.
I wonder if this is the best practice, because the shortcoming about this approach is:
if files are too many&big, I can download them and upload to S3, then delete them from the temp folder, but the question is if I run this script once a day, I have to download everything again, so how can I check if a file is already been downloaded & existed in S3 so that the script will only process the new-added files in FTP?
Hope this makes sense, would be great if anyone has an example or something, many thanks.