Instead of using boto3
, I opt for aws-cli
and sh
. See the aws s3 cp docs for full list of arguments, which you can include as kwargs in the following (reworked from my own code) which can be used to copy to / from / between S3 buckets and / or local targets:
import sh # also assumes aws-cli has been installed
def s3_cp(source, target, **kwargs):
"""
Copy data from source to target. Include flags as kwargs
such as recursive=True and include=xyz
"""
args = []
for flag_name, flag_value in kwargs.items():
if flag_value is not False: # i.e. --quiet=False means omit --quiet
args.append(f"--{flag_name}")
if flag_value is not True: # i.e. --quiet=True means --quiet
args.append(flag_value)
args += [source, target]
sh.aws("s3", "cp", *args)
bucket to bucket (as per the OP's question):
s3_cp("s3://B1/x/", "s3://B1/y/", quiet=True, recursive=True)
or bucket to local:
s3_cp("s3://B1/x/", "my-local-dir/", quiet=True, recursive=True)
Personally I found that this method gave improved transfer time (of a few GB over 20k small files) from a couple of hours to a few minutes compared to boto3
. Perhaps under the hood it's doing some threading or simply opening few connections - but that's just speculation.
Warning: it won't work on Windows.
Related: https://stackoverflow.com/a/46680575/1571593