2

I would like to check if a file exists in a separate directory of the bucket if a given file exists. I have the following directory structure-

import boto3
s3 = boto3.resource('s3')
def file_exists(fileN):
    try:
        s3.Object('my-bucket', 'folder1/folder2/'+fileN).load()
    except:
        return False
    else:
        fileN = fileN.split(".")[0]
        try:

            s3.Object('my-bucket', 'folder1/<randomid folderxxxx>/'+fileN+'_condition.jpg').load()
        except:
            return False
        else:
            return True

file_exists("test.jpg")

This works but as long as I can send the randomfolderIDas an argument. Is there a better and elegant way to do it?

Basically I have to check if,

my-bucket/folder1/folder2/test.jpg if this exists then check my-bucket/folder1/<randomID>/test_condition.jpg if this also exists then return True

Pavan K
  • 4,085
  • 8
  • 41
  • 72

2 Answers2

2

I ended up using this which gave a little cleaner code

import boto3
s3client = boto3.client('s3')

def all_file_exist(bucket, prefix, fileN):
    fileFound = False
    fileConditionFound = False
    theObjs = s3client.list_objects_v2(Bucket=bucket, Prefix=prefix)
    for object in theObjs['Contents']:
        if object['Key'].endswith(fileN+'_condition.jpg') :
            fileConditionFound = True
        if object['Key'].endswith(fileN+".jpg") :
            fileFound = True
    if (fileFound and fileConditionFound) : 
        return True
    return False

all_file_exist("bucket","folder1", "test")

Pavan K
  • 4,085
  • 8
  • 41
  • 72
1

It is not possible to specify an object key via a wildcard.

Instead, you would need to do a bucket listing (which can be against the whole bucket, or within a path) and then perform your own logic for identifying the file of interest.

If the number of objects is small (eg a few thousand), the list can be easily retrieved and kept in memory for fast comparison in a Python list.

If there are millions of objects, you might consider using Amazon S3 Inventory, which can provide a daily CSV file that lists all objects in the bucket. Using such a file would be faster than scanning the bucket itself.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470