I have a large amount of images stored in an AWS S3 bucket.
Every week, I run a classification task on all these images. The way I'm doing it currently is by downloading all the images to my local PC, processing them, then making database changes once the process is complete.
I would like to reduce the amount of time spent downloading images to increase the overall speed of the classification task.
EDIT2:
I actually am required to process 20,000 images at a time to increase performance of the classification engine. This means I can't use Lambdas since the maximum option for RAM available is 3GB and I need 16GB to process all 20,000 images
The classification task uses about 16GB of RAM. What AWS service can I use to automate this task? Is there a service that can be put on the same VLAN as the S3 Bucket so that images transfer very quickly?
The entire process takes about 6 hours to do. If I spin up an EC2 with 16GB of RAM it would be very cost ineffective as it would finish after 6 hours then spend the remainder of the week sitting there doing nothing.
Is there a service that can automate this task in a more efficient manner?
EDIT:
Each image is around 20-40KB. The classification is a neural network, so I need to download each image so I can feed it through the network.
Multiple images are processed at the same time (batches of 20,000), but the processing part doesn't actually take that long. The longest part of the whole process is the downloading part. For example, downloading takes about 5.7 hours, processing takes about 0.3 hours in total. Hence why I'm trying to reduce the amount of downloading time.