Popular
A common method people use is via s3api
to consolidate multiple calls into a single LIST request for every 1000 objects and then use --query
to define your filtering operation, such as:
aws s3api list-objects-v2 --bucket your-bucket-name --query 'Contents[?contains(LastModified, `$DATE`)]'
Although please keep in mind that this isn't a good solution for two reasons:
- This does not scale really well especially with large buckets and it does not help much in minimizing the data outbound.
- It does not reduce the number of S3 API calls because the
--query
parameter isn't performed in the server-side. It just so happened to be a feature of this aws-cli command. To illustrate, this is how it would look like in boto3
and as you can see we'd still need to query it on client-side:
import boto3
client = boto3.client('s3',region_name='us-east-1')
response = client.list_objects_v2(Bucket='your-bucket-name')
results = sorted(response['Contents'], key=lambda item: item['LastModified'])[-1])
Probably
One thing you could *probably* do depending on your specific use case is to utilize S3 Event Notifications to automatically publish an event to SQS
which gives you the opportunity to poll for all the S3 object events along with their metadata information which is more lightweight. This is still going to cost some money and it's not going to work if you already have an existing big bucket to begin with. Plus the fact that you'll have to actively poll for the messages since they won't persist too long.
Perfect (sorta)
This sounds to me like a good use case for S3 Inventory. It will deliver a daily file for you which is comprised of the list of objects and their metadata information based on your specifications. See https://docs.aws.amazon.com/AmazonS3/latest/user-guide/configure-inventory.html