5

My s3 filename is 'folder/filename.xml'. i want to take the files end with 'name.xml'

import boto3
s3 = boto3.resource('s3')
try:
fileobj = s3.Object('lcu-matillion',''folder/.*name.xml'').get()['Body']

data=fileobj.read()
except Exception:
  print('not found')    

Any one please help with accurate code? Thanks

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • 1
    Use the `glob` module in the standard library (but you probably need `*`, not `.*`). https://docs.python.org/3/library/glob.html – cdarke Aug 29 '18 at 08:24
  • 1
    By the way, `except Exception` is dangerous, do you think every exception in opening a file is "not found"? – cdarke Aug 29 '18 at 08:27
  • Side-note: You could use the [AWS Command-Line Interface (CLI)](http://aws.amazon.com/cli/) `aws s3 cp` command with `--include`. – John Rotenstein Aug 29 '18 at 08:36

2 Answers2

9

Don't forget that there could be multiple files that match that wildcard.

You would use something like:

import boto3

s3 = boto3.resource('s3', region_name='ap-southeast-2')

bucket = s3.Bucket('my-bucket')

objects = bucket.objects.filter(Prefix='folder-name/')

for object in objects:
  if object.key.endswith('.txt'):
    object.download_file('/tmp/' + object.key)
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
2

This is a pretty OLD one and I am at loss that the main answer which has been accepted is a very poor and potentially dangerous one.

This essentially lists ALL objects and brings searching to the client side. On a bucket with thousands of objects (I guess most buckets) that is terrible.

What you need to do is to use .filter() instead of .all():

s3 = boto3.resource('s3')
buc = s3.Bucket("twtalyser")
for s in buc.objects.filter(Prefix='my/desired/prefix'):
    print(s)

UPDATE

The main answer was updated to reflect the point I was making.

Aliostad
  • 80,612
  • 21
  • 160
  • 208