Filter by suffix
If the S3 object's key is a filename, the suffix for your objects is a filename-extension (like .csv
).
So filter the objects by key ending with .csv
.
Use filter(predicate, iterable)
operation with predicate as lambda testing for str.endswith(suffix)
:
s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']
csvs = filter(lambda obj: obj['Key'].endswith('.csv'), objs) # csv only
csvs.sort(key=lambda obj: obj['LastModified'], reverse=True) # last first, sort by modified-timestamp descending
return csvs[0]
Note: To get the last-modified only
This solution alternates the sort direction using reverse=True
(descending) to pick the first which will be the last modified.
You can also sort
default (ascending) and pick the last with [-1]
as answered by Kache in your preceding question.
Simplification
From the boto3 list_objects_v2
docs about the response structure:
Contents (list)
...
LastModified (datetime) -- Creation date of the object.
Boto3 returns a datetime object for LastModified
. See also Getting S3 objects' last modified datetimes with boto.
So why do we need additional steps to format it as string and then convert to int: int(obj['LastModified'].strftime('%s'))
?
Python can also sort the datetime directly.
Limitation warning
S3's API operation and its corresponding Boto3 method list_objects_v2
limit the result set to one thousand objects:
Returns some or all (up to 1,000) of the objects in a bucket with each request.
So, for buckets with many homonymous objects, even after applying the prefix-filter, your result can be implicitly truncated.