0

I am writing a Python 3.4 + boto3 script to download all files in an s3 bucket/folder. I'm using s3.resource rather than client because this EMR cluster already has the key credentials.

This works to download a single file:

s3 = boto3.resource('s3')
bucket = "my-bucket"
file = "some_file.zip"
filepath = "some_folder/some_file.zip"


def DL(bucket, key, local_name):
    s3.Bucket(bucket).download_file(key, local_name)

DL(bucket, filepath, file)

But I need to download all files in a folder within the bucket, which have a format like so:

some_file_1.zip
some_file_2.zip
some_file_3.zip, etc.

It should be simple but I guess we can't use a wildcard or pattern match like "some_file*". So I have to loop through and find each file name?

And call download_file for each file name?

Chuck
  • 1,061
  • 1
  • 20
  • 45
  • 1
    Get a list of all keys in the bucket. See https://stackoverflow.com/questions/30249069/listing-contents-of-a-bucket-with-boto3. Then get the keys of the files you want using wilcards on the list – Karl Jul 17 '19 at 18:10

1 Answers1

1

You can use listobjectsv2 and pass the prefix to only get the keys inside your s3 "folder". Now you can use a for loop to go through all these keys and download them all. Use conditions if you need to filter them more.

Ninad Gaikwad
  • 4,272
  • 2
  • 13
  • 23