get s3 files with prefix using python

Question

My s3 filename is 'folder/filename.xml'. i want to take the files end with 'name.xml'

import boto3
s3 = boto3.resource('s3')
try:
fileobj = s3.Object('lcu-matillion',''folder/.*name.xml'').get()['Body']

data=fileobj.read()
except Exception:
  print('not found')

Any one please help with accurate code? Thanks

Use the `glob` module in the standard library (but you probably need `*`, not `.*`). https://docs.python.org/3/library/glob.html — cdarke, Aug 29 '18 at 08:24
By the way, `except Exception` is dangerous, do you think every exception in opening a file is "not found"? — cdarke, Aug 29 '18 at 08:27
Side-note: You could use the [AWS Command-Line Interface (CLI)](http://aws.amazon.com/cli/) `aws s3 cp` command with `--include`. — John Rotenstein, Aug 29 '18 at 08:36

John Rotenstein · Accepted Answer · 2021-01-15T23:53:31.933

9

Don't forget that there could be multiple files that match that wildcard.

You would use something like:

import boto3

s3 = boto3.resource('s3', region_name='ap-southeast-2')

bucket = s3.Bucket('my-bucket')

objects = bucket.objects.filter(Prefix='folder-name/')

for object in objects:
  if object.key.endswith('.txt'):
    object.download_file('/tmp/' + object.key)

edited Jan 15 '21 at 23:53

answered Aug 29 '18 at 08:43

John Rotenstein

241,921
22
380
470

1

This is dangerously inefficient. -1 – Aliostad Jan 15 '21 at 10:55
Agreed -- using a filter would reduce the number of objects returned. Checking for a suffix would still require a loop within the `if`. I'll update the answer. – John Rotenstein Jan 15 '21 at 23:52
OK +1 now! Thanks – Aliostad Jan 17 '21 at 09:26
It should be object.Object().download_file('/tmp/' + object.key) – rmuller Oct 06 '21 at 12:02

Aliostad · Answer 2 · 2021-01-17T09:27:11.013

This is a pretty OLD one and I am at loss that the main answer which has been accepted is a very poor and potentially dangerous one.

This essentially lists ALL objects and brings searching to the client side. On a bucket with thousands of objects (I guess most buckets) that is terrible.

What you need to do is to use .filter() instead of .all():

s3 = boto3.resource('s3')
buc = s3.Bucket("twtalyser")
for s in buc.objects.filter(Prefix='my/desired/prefix'):
    print(s)

UPDATE

The main answer was updated to reflect the point I was making.

get s3 files with prefix using python

2 Answers2

UPDATE