28

The other questions I could find were refering to an older version of Boto. I would like to download the latest file of an S3 bucket. In the documentation I found that there is a method list_object_versions() that gets you a boolean IsLatest. Unfortunately I only managed to set up a connection and to download a file. Could you please show me how I can extend my code to get the latest file of the bucket? Thank you

import boto3
conn = boto3.client('s3',
                    region_name="eu-west-1",
                    endpoint_url="customendpoint",
                    config=Config(signature_version="s3", s3={'addressing_style': 'path'}))

From here I dont know how to get the latest added file from a bucket called mytestbucket. There are various csv files in the bucket but all of course with a different name.

Update:

import boto3
from botocore.client import Config

s3 = boto3.resource('s3', region_name="eu-west-1", endpoint_url="custom endpoint", aws_access_key_id = '1234', aws_secret_access_key = '1234', config=Config(signature_version="s3", s3={'addressing_style': 'path'}))
my_bucket = s3.Bucket('mytestbucket22')
unsorted = []
for file in my_bucket.objects.filter():
   unsorted.append(file)

files = [obj.key for obj in sorted(unsorted, key=get_last_modified, reverse=True)][0:9]

This gives me the following error:

NameError: name 'get_last_modified' is not defined
jz22
  • 2,328
  • 5
  • 31
  • 50

7 Answers7

32

Variation of the answer I provided for: Boto3 S3, sort bucket by last modified. You can modify the code to suit to your needs.

get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))

s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my_bucket')['Contents']
last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified)][0]

If you want to reverse the sort:

[obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True)][0]
helloV
  • 50,176
  • 7
  • 137
  • 145
  • 1
    Thank you. I added my configuration to the client and edited my buckets name but I get the following error: get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s')) ValueError: Invalid format string – jz22 Jul 28 '17 at 16:10
  • Are you using `Python 2.7` or `Python 3`? – helloV Jul 28 '17 at 16:16
  • 1
    I'm using 3.6.1. – jz22 Jul 28 '17 at 16:21
  • 11
    I think you should modify `[0]` with `[-1]` if you want the latest added file. – rpanai Mar 01 '18 at 20:05
  • Do you need to consider pagination if a directory has over 1,000 objects? – mbunch May 15 '18 at 15:10
  • 3
    @MattBunch yes, if the bucket has more than 1000 objects, you need to paginate, fetch all objects and then sort. – helloV May 15 '18 at 15:30
  • One question here. I am not clear about the overall steps to run this piece of code. Do you put this lambda and boto3 function in AWS lambda service's function code area or put it in a Python script that run in EC2? If you put it in AWS lambda, do you need to assign a specific AWS role so it can access S3? Thanks. – user1457659 Oct 23 '18 at 21:20
  • @user1457659 this has nothing to do with AWS Lambda service. I am using Python lambda function. Should work on any machine that has Python installed and AWS credentials set correctly. – helloV Oct 23 '18 at 21:27
  • @jz22 and others. If you are running into the `Invalid format string` error, change the `%s` to `%S` (upper case 'S'). Check out [this](https://github.com/addisonlynch/pyTD/issues/1). – Binx Jul 05 '22 at 20:39
22

You can do

import boto3

s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket='bucket_name', Prefix='prefix')
all = response['Contents']        
latest = max(all, key=lambda x: x['LastModified'])
smaraf
  • 221
  • 2
  • 3
  • Hence, if you are looking out for the latest updated folder, you could go ahead with `latest['Key'].split('/')[1]` – Arnab Das Jan 21 '21 at 19:21
  • 1
    It should be noted this will only show the latest object from the first 1000 objects in the bucket. You'll need to use a paginator if your bucket contains more objects. – Anon Coward Jun 16 '22 at 16:16
21

This handles when there are more than 1000 objects in the s3 bucket. This is basically @SaadK answer without the for loop and using newer version for list_objects_v2.

EDIT: Fixes issue @Timothée-Jeannin identified. Ensures that latest across all pages is identified.

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Paginator.ListObjectsV2

import boto3

def get_most_recent_s3_object(bucket_name, prefix):
    s3 = boto3.client('s3')
    paginator = s3.get_paginator( "list_objects_v2" )
    page_iterator = paginator.paginate(Bucket=bucket_name, Prefix=prefix)
    latest = None
    for page in page_iterator:
        if "Contents" in page:
            latest2 = max(page['Contents'], key=lambda x: x['LastModified'])
            if latest is None or latest2['LastModified'] > latest['LastModified']:
                latest = latest2
    return latest

latest = get_most_recent_s3_object(bucket_name, prefix)

latest['Key']  # -->   'prefix/objectname'
marginal_dev
  • 211
  • 3
  • 4
10

If you have a lot of files then you'll need to use pagination as mentioned by helloV. This is how I did it.

get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
s3 = boto3.client('s3')
paginator = s3.get_paginator( "list_objects" )
page_iterator = paginator.paginate( Bucket = "BucketName", Prefix = "Prefix")
for page in page_iterator:
    if "Contents" in page:
        last_added = [obj['Key'] for obj in sorted( page["Contents"], key=get_last_modified)][-1]
SaadK
  • 1,507
  • 20
  • 33
  • how can I download the latest file.here it is populating only filename .could please tell me how to get download – user 98 Mar 04 '22 at 10:24
3

This is basically the same answer as helloV in the case you use Session as I'm doing.

from boto3.session import Session
import settings

session = Session(aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
                          aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY)
s3 = session.resource("s3")

get_last_modified = lambda obj: int(obj.last_modified.strftime('%s'))


bckt = s3.Bucket("my_bucket")
objs = [obj for obj in bckt.objects.all()]

objs = [obj for obj in sorted(objs, key=get_last_modified)]
last_added = objs[-1].key

Having objs sorted allows you to quickly delete all files but the latest with

for obj in objs[:-1]:
    s3.Object("my_bucket", obj.key).delete()
rpanai
  • 12,515
  • 2
  • 42
  • 64
0

You should be able to download the latest version of the file using default download file command

import boto3
import botocore

BUCKET_NAME = 'mytestbucket'
KEY = 'fileinbucket.txt'

s3 = boto3.resource('s3')

try:
    s3.Bucket(BUCKET_NAME).download_file(KEY, 'downloadname.txt')
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
        print("The object does not exist.")
    else:
        raise

Reference link

To get the last modified or uploaded file you can use the following

s3 = boto3.resource('s3')
my_bucket = s3.Bucket('myBucket')
unsorted = []
for file in my_bucket.objects.filter():
   unsorted.append(file)

files = [obj.key for obj in sorted(unsorted, key=get_last_modified, 
    reverse=True)][0:9]

As answer in this reference link states, its not the optimal but it works.

Ashan
  • 18,898
  • 4
  • 47
  • 67
  • 1
    Thanks. Maybe my question was not clear enough. I just edited it and provided more information. I would like to download the latest file from a bucket that contains a couple of csv files and I want to download always the latest no matter what name it has. – jz22 Jul 28 '17 at 14:40
  • Latest in the sense, not the latest version of a particular file? or the last added file? – Ashan Jul 28 '17 at 15:03
  • The latest added file. – jz22 Jul 28 '17 at 15:04
  • Thank you. Unfortunately this shows me another error. I've put it into my question. Do I have to import something else? – jz22 Jul 28 '17 at 15:32
0

I also wanted to download latest file from s3 bucket but located in a specific folder. Use following function to get latest filename using bucket name and prefix (which is folder name).

import boto3

def get_latest_file_name(bucket_name,prefix):
    """
    Return the latest file name in an S3 bucket folder.

    :param bucket: Name of the S3 bucket.
    :param prefix: Only fetch keys that start with this prefix (folder  name).
    """
    s3_client = boto3.client('s3')
    objs = s3_client.list_objects_v2(Bucket=bucket_name)['Contents']
    shortlisted_files = dict()            
    for obj in objs:
        key = obj['Key']
        timestamp = obj['LastModified']
        # if key starts with folder name retrieve that key
        if key.startswith(prefix):              
            # Adding a new key value pair
            shortlisted_files.update( {key : timestamp} )   
    latest_filename = max(shortlisted_files, key=shortlisted_files.get)
    return latest_filename

latest_filename = get_latest_file_name(bucket_name='use_your_bucket_name',prefix = 'folder_name/')
Sayali Sonawane
  • 12,289
  • 5
  • 46
  • 47