Listing objects in S3 with suffix using boto3

Question

def get_latest_file_movement(**kwargs):
    get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
    s3 = boto3.client('s3')
    objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']
    last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True)][0]
    return last_added

Above code gets me the latest file however i only want the files ending with 'csv'

hc_dev · Answer 1 · 2023-06-19T18:44:07.330

Filter by suffix

If the S3 object's key is a filename, the suffix for your objects is a filename-extension (like .csv).

So filter the objects by key ending with .csv.

Use filter(predicate, iterable) operation with predicate as lambda testing for str.endswith(suffix):

s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']

csvs = filter(lambda obj: obj['Key'].endswith('.csv'), objs)  # csv only 
csvs.sort(key=lambda obj: obj['LastModified'], reverse=True)  # last first, sort by modified-timestamp descending

return csvs[0]

Note: To get the last-modified only

This solution alternates the sort direction using reverse=True (descending) to pick the first which will be the last modified. You can also sort default (ascending) and pick the last with [-1] as answered by Kache in your preceding question.

Simplification

From the boto3 list_objects_v2 docs about the response structure:

Contents (list) ... LastModified (datetime) -- Creation date of the object.

Boto3 returns a datetime object for LastModified. See also Getting S3 objects' last modified datetimes with boto.

So why do we need additional steps to format it as string and then convert to int: int(obj['LastModified'].strftime('%s')) ?

Python can also sort the datetime directly.

Limitation warning

S3's API operation and its corresponding Boto3 method list_objects_v2 limit the result set to one thousand objects:

Returns some or all (up to 1,000) of the objects in a bucket with each request.

So, for buckets with many homonymous objects, even after applying the prefix-filter, your result can be implicitly truncated.

I like this answer, but you have obj and the lambda function swapped in the filter function. Filter function requires the first parameter to be the function that returns True/False and the second parameter to be the collection. — Danny, Apr 29 '22 at 15:56
@Danny, thanks for spotting this. You always have to pay attention when using built-ins `filter` and `sorted` (the order of parameters is different). That's why I prefer [`list.sort()`](https://docs.python.org/3/howto/sorting.html) among others (modify in place, readability, etc.). — hc_dev, May 02 '22 at 11:25
@IllyaMoskvin, thanks for the scalability hint. I added this as "Limitation warning". — hc_dev, Jun 19 '23 at 18:46

Marcin · Accepted Answer · 2022-02-16T00:22:25.337

0

You can check if they end with .csv:

def get_latest_file_movement(**kwargs):
    get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
    s3 = boto3.client('s3')
    objs = s3.list_objects_v2(Bucket='my-bucket',Prefix='prefix')['Contents']

    last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True) if obj['Key'].endswith('.csv')][0]

    return last_added

edited Feb 16 '22 at 00:22

answered Feb 15 '22 at 23:50

Marcin

215,873
14
235
294

yeah i wanted the latest file ending with '.csv', sorry if that wasnt clear in the question – facepalmdev7 Feb 16 '22 at 00:22
1

Note that this will only consider the first 1000 objects in a bucket, which may or may not matter for the given use case. – Anon Coward Feb 16 '22 at 00:26

Listing objects in S3 with suffix using boto3

2 Answers2

Filter by suffix

Simplification

Limitation warning