There are two distinct phases that your solution would require:
- Obtain a list of files to download
- Download the files
I would recommend separating these two tasks because an error in the logic for listing the files could stop the download process mid-way, making it difficult to resume once the problem is corrected.
Listing the files is probably best done on your local computer, which would be easy to debug and track progress. The result would be a text file with lots of links. (This is similar in concept to a lot of scraper utilities.)
The second portion (downloading the files) could be done on either Amazon EC2 or via AWS Lambda functions.
Using Amazon EC2
This would be a straight-forward app that reads your text file, loops through the links and downloads the files. If this is a one-off requirement, I wouldn't invest too much time getting fancy with multi-threading your app. However, this means you won't be taking full advantage of the network bandwidth, and Amazon EC2 is charged per hour.
Therefore, I would recommending using fairly small instance types (each with limited network bandwidth that you can saturate), but running multiple instances in parallel, each with a portion of your list of text files. This way you can divide and conquer.
If something goes wrong mid-way, you can always tweak the code, manually edit the text file to remove the entries already completed, then continue. This is fairly quick-and-dirty, but fine if this is just a one-off requirement.
Additionally, I would recommend using Amazon EC2 Spot Instances, which can save up to 90% of the cost of Amazon EC2. There is a risk of an instance being terminated if the Spot Price rises, which would cause you some extra work to determine where to resume, so simply bid a price equal to the normal On-Demand price and it will be unlikely (but not guaranteed) that your instances won't be terminated.
Using AWS Lambda functions
Each AWS Lambda function can only run for a maximum of 5 minutes and can only store 500MB of data locally. Fortunately, functions can be run in parallel.
Therefore, to use AWS Lambda, you would need to write a controlling app that calls an AWS Lambda function for each file in your list. If any of the files exceed 500MB, this would need special handling.
Writing, debugging and monitoring a parallel, distributed application like this probably isn't worth the effort for a one-off task. It would be much harder to debug any problems and recover from errors. (It would, however, be an ideal way to do continuous downloads if you have a continuing business need for this process.)
Bottom line: I would recommend writing and debugging the downloader app on your local computer (with a small list of test files), then using multiple Amazon EC2 Spot Instances running in parallel to download the files and upload them to Amazon S3. Start with one instance and a small list to test the setup, then go parallel with bigger lists. Have fun!