I have some Python code, it download some files from FTP and write them to AWS S3 bucket. Now I want to avoid any missing files, so my current approach is:
-> list all the available files in FTP, add the filenames into a list list_1
-> list all the files in S3 bucket, add the filenames into a list list_2
-> then compare list_1
and list_2
, identify missing files that's not been downloaded to s3
-> download those missing files.
The issue is these code need to be run every hour, and there are quite a lot of files in FTP, so the first step (to list the filenames in FTP) takes a long time to run (I have a separate question for this: link). Does anyone have other better ideas to improve this logic and may be faster to execute?