3

I am working on an automation piece where I need to download all files from a folder inside a S3 bucket irrespective of the file name. I understand that the using boto3 in python I can download a file like:

s3BucketObj = boto3.client('s3', region_name=awsRegion, aws_access_key_id=s3AccessKey, aws_secret_access_key=s3SecretKey)
s3BucketObj.download_file(bucketName, "abc.json", "/tmp/abc.json")

but I was then trying to download all files irrespective of what filename to be specified in this way:

s3BucketObj.download_file(bucketName, "test/*.json", "/test/")

I know the syntax above could be totally wrong but is there a simple way to do that?

I did find a thread which helps here but seems a bit complex: Boto3 to download all files from a S3 Bucket

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Suyash Gupta
  • 119
  • 1
  • 8

1 Answers1

3

There is no API call to Amazon S3 that can download multiple files.

The easiest way is to use the AWS Command-Line Interface (CLI), which has aws s3 cp --recursive and aws s3 sync commands. It will do everything for you.

If you choose to program it yourself, then Boto3 to download all files from a S3 Bucket is a good way to do it. This is because you need to do several things:

  • Loop through every object (there is no S3 API to copy multiple files)
  • Create a local directory if it doesn't exist
  • Download the object to the appropriate local directory

The task can be made simpler if you do not wish to reproduce the directory structure (eg if all objects are in the same path). In that case, you can simply loop through the objects and download each of them to the same directory.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470