56

I have to move files between one bucket to another with Python Boto API. (I need it to "Cut" the file from the first Bucket and "Paste" it in the second one). What is the best way to do that?

** Note: Is that matter if I have two different ACCESS KEYS and SECRET KEYS?

Asclepius
  • 57,944
  • 17
  • 167
  • 143
Gal
  • 669
  • 2
  • 8
  • 9

13 Answers13

53

If you are using boto3 (the newer boto version) this is quite simple

import boto3
s3 = boto3.resource('s3')
copy_source = {
    'Bucket': 'mybucket',
    'Key': 'mykey'
}
s3.meta.client.copy(copy_source, 'otherbucket', 'otherkey')

(Docs)

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • You dont need to extract the client from the meta of the resource object. I found that going straight through the client.copy_object (or client.copy) takes in the same parameters as the ones in the suggested answer and seems to be more consistent (I was getting a lot of 404s with the meta) https://github.com/boto/boto3/issues/1715 – Ernesto Mar 13 '20 at 15:15
  • How does this work when your source and destination are two buckets with two different access keys? – David Maddox Apr 21 '20 at 18:18
  • 34
    Moving and copying are not the same thing. – Chris Ivan Mar 11 '21 at 04:37
  • @ChrisIvan You can delete it from old location afterwards. If you have an issue with the semantics, please provide an alternative. – David Arenburg Dec 17 '21 at 10:56
  • 4
    @DavidArenburg It's important to note the distinction, because the reader of your answer may not be aware, and the question asks for how to move. – Chris Ivan Jan 07 '22 at 08:46
  • 1
    yes important distinction. also sure everyone also agrees that the answer is 90% there, and 100% helpful Thanks @DavidArenburg – jpmorris Sep 20 '22 at 23:02
41

I think the boto S3 documentation answers your question.

https://github.com/boto/boto/blob/develop/docs/source/s3_tut.rst

Moving files from one bucket to another via boto is effectively a copy of the keys from source to destination and then removing the key from source.

You can get access to the buckets:

import boto

c = boto.connect_s3()
src = c.get_bucket('my_source_bucket')
dst = c.get_bucket('my_destination_bucket')

and iterate the keys:

for k in src.list():
    # copy stuff to your destination here
    dst.copy_key(k.key.name, src.name, k.key.name)
    # then delete the source key
    k.delete()

See also: Is it possible to copy all files from one S3 bucket to another with s3cmd?

Community
  • 1
  • 1
Freek Wiekmeijer
  • 4,556
  • 30
  • 37
  • 1
    My question is how to copy the files...? – Gal May 11 '15 at 07:59
  • 8
    This is likely the best way to do it. Keep in mind if you have versioning on there will be shadows leftover in the original bucket. Also, you may want to wrap your copy on a try:expect so you don't delete before you have a copy. You can also copy and keep track of copies and then go through the dst bucket and do a key.lookup() and make sure it is there, and if so then and only then do a orig.delete(). – cgseller May 11 '15 at 18:15
  • Gal: keys are objects, and the objects contain contents. By moving the key you are effectively moving the 'file'. Think of it like moving the file pointer in the filesystem when you copy a file on your computer, under the hood it is the same methodology. – cgseller May 11 '15 at 18:18
  • 3
    syntax seems to be incorrect, should be `dst.copy_key(k.key.name, src.name, k.key.name)` as you need to specify bucket and key names (not their objects) - had me stumped for a while :) – Marty Dec 21 '15 at 12:03
  • @marty: thanks, I think you are right and updated the answer accordingly. – Freek Wiekmeijer Dec 21 '15 at 12:11
12

If you have 2 different buckets with different access credentials. Store the credentials accordingly in credentials and config files under ~/.aws folder.

you can use the following to copy object from one bucket with different credentials and then save the object in the other bucket with different credentials:

import boto3


session_src = boto3.session.Session(profile_name=<source_profile_name>)
source_s3_r = session_src.resource('s3')

session_dest = boto3.session.Session(profile_name=<dest_profile_name>)
dest_s3_r = session_dest.resource('s3')

# create a reference to source image
old_obj = source_s3_r.Object(<source_s3_bucket_name>, <prefix_path> + <key_name>)

# create a reference for destination image
new_obj = dest_s3_r.Object(<dest_s3_bucket_name>, old_obj.key)

# upload the image to destination S3 object
new_obj.put(Body=old_obj.get()['Body'].read())

Both bucket do not need to have accessibility from each other in the ACL or the bucket policies.

agrawalramakant
  • 160
  • 1
  • 5
  • 3
    old_obj.get()['Body'].read() creates a local copy before uploading to the destination bucket. Is there an efficient way to directly copy from src to dest bucket? – 333 Sep 22 '20 at 07:33
11

awscli does the job 30 times faster for me than boto coping and deleting each key. Probably due to multithreading in awscli. If you still want to run it from your python script without calling shell commands from it, you may try something like this:

Install awscli python package:

sudo pip install awscli

And then it is as simple as this:

import os
if os.environ.get('LC_CTYPE', '') == 'UTF-8':
    os.environ['LC_CTYPE'] = 'en_US.UTF-8'

from awscli.clidriver import create_clidriver
driver = create_clidriver()
driver.main('s3 mv source_bucket target_bucket --recursive'.split())
Artem Fedosov
  • 2,163
  • 2
  • 18
  • 28
  • 1
    How can i give configurations here without setting them in my environment variables – Nitish Agarwal May 12 '16 at 09:07
  • Do not know an easy way. I would set the env variables from python before running the driver. – Artem Fedosov May 12 '16 at 13:47
  • Isn't awscli based on boto though? – gtd Aug 09 '16 at 20:32
  • Both boto and awscli based on botocore. However boto itself does not expose API analogous to `aws s3 mv`. I still did not conduct a proper experiment to prove that `mv` is not equivalent to `cp` + `rm`, but I honestly hope so :) I think main performance boost for me was due to multithreading in awscli, and I probably could reach quite a similar speed by implementing it myself. – Artem Fedosov Aug 11 '16 at 06:09
  • Although I think the following answer by freek is better, but I would imagine that if you have .aws/config and .aws/credentials set up for the user you could use`driver.main('--profile=myprofile s3 mv source_bucket target_bucket --recursive'.split())` – rabinnh Feb 02 '18 at 23:10
  • Why `LC_CTYPE` has to be set to `'en_US.UTF-8'`? – datapug Sep 22 '21 at 12:11
  • Good question, this does not look essential to what the snippet does, but this bit it was causing warnings/errors 7 years ago. I kept it so that everyone doesn't have to resolve the same issue. Let me know if removing this snippet does not cause problems anymore. – Artem Fedosov Sep 24 '21 at 04:26
6

If you want to

Create a copy of an object that is already stored in Amazon S3.

then copy_object is the way to go in boto3.

How I do it:

import boto3

aws_access_key_id = ""
aws_secret_access_key = ""
bucket_from = ""
bucket_to = ""
s3 = boto3.resource(
    's3',
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key
)
src = s3.Bucket(bucket_from)

def move_files():
    for archive in src.objects.all():
        # filters on archive.key might be applied here

        s3.meta.client.copy_object(
            ACL='public-read',
            Bucket=bucket_to,
            CopySource={'Bucket': bucket_from, 'Key': archive.key},
            Key=archive.key
        )

move_files()
Tom Wojcik
  • 5,471
  • 4
  • 32
  • 44
4

Copy between different or same buckets can be easily done in boto3 by:

import boto3
s3 = boto3.resource('s3')
copy_source = {
    'Bucket': 'mybucket',
    'Key': 'mykey'
}
bucket = s3.Bucket('otherbucket')
bucket.copy(copy_source, 'otherkey')

# This is a managed transfer that will perform a multipart copy in
# multiple threads if necessary.
  • I like the use of bucket.copy. simple. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Bucket.copy – MattC Dec 06 '22 at 19:04
3

Bucket name must be string not bucket object. Below change worked for me

for k in src.list():
    dst.copy_key(k.key, src.name, k.key)
0

Hope this answer will help, Thanks @agrawalramakant.

import boto3


# object_key = 'posts/0173c352-f9f8-4bf1-a818-c99b4c9b0c18.jpg'
def move_from_s3_to_s3(object_key):
    session_src = boto3.session.Session(aws_access_key_id="",
                                        region_name="ap-south-1",
                                        aws_secret_access_key="")

    source_s3_r = session_src.resource('s3')

    session_dest = boto3.session.Session(aws_access_key_id="",
                                         region_name="ap-south-1",
                                         aws_secret_access_key="")

    dest_s3_r = session_dest.resource('s3')
    # create a reference to source image
    old_obj = source_s3_r.Object('source_bucket_name', object_key)

    # create a reference for destination image
    new_obj = dest_s3_r.Object('dest_bucket_name', object_key)

    # upload the image to destination S3 object
    new_obj.put(Body=old_obj.get()['Body'].read())
Mohamed Jaleel Nazir
  • 5,776
  • 3
  • 34
  • 48
0

I did this to move files between 2 S3 locations.

It handles the following scenario :

  • If you want to move files with specific prefixes in their names
  • If you want to move them between 2 subfolders within the same bucket
  • If you want to move them between 2 buckets
import boto3
s3 = boto3.resource('s3')

vBucketName = 'xyz-data-store'
#Source and Target Bucket Instantiation
vTargetBkt = s3.Bucket('xyz-data-store')
vSourceBkt = s3.Bucket('xyz-data-store')

#List of File name prefixes you want to move
vSourcePath = ['abc/1/test1_', 'abc/1/test2_'
               ,'abc/1/test3_','abc/1/test4_']
#List of Folder names you want the files to be moved to
vTargetPath = ['abc/1/test1_', 'abc/1/test2_'
               ,'abc/1/test3_','abc/1/test4_']

for (sP, tP) in zip(vSourcePath,vTargetPath) :
    for se_files in vSourceBkt.objects.filter(Prefix = sP, Delimiter = '/'):
        SourceFileName = (se_files.key).split('/')[-1]
        copy_source = {
            'Bucket': vSourceBkt.name,
            'Key': se_files.key
        }
        #print('SourceFileName ' + SourceFileName)
        #print('se_files ' + se_files.key)
        TargetFileName = str("{}{}".format(tP,SourceFileName))
        print('TargetFileName ' + TargetFileName)
        s3.meta.client.copy(copy_source, vBucketName, TargetFileName)
  
        #Delete files in the Source when the code is working
Leonshi96
  • 1
  • 1
0
  1. On source AWS account, add this policy to the source S3 bucket:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::SOURCE_BUCKET_NAME",
                "arn:aws:s3:::SOURCE_BUCKET_NAME/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::DESTINATION_BUCKET_NAME",
                "arn:aws:s3:::DESTINATION_BUCKET_NAME/*"
            ]
        }
    ]
}
  1. Using the destination account's credentials:
boto3_session = boto3.Session(aws_access_key_id=<your access key>,
                              aws_secret_access_key=<your secret_access_key>)
s3_resource = boto3_session.resource('s3')
bucket = s3_resource.Bucket("<source bucket name>")

for obj in bucket.objects.all():
    obj_path = str(obj.key)

    copy_source = {
        'Bucket': "<source bucket name>",
        'Key': obj_path
    }
    s3_resource.meta.client.copy(copy_source, "<destination bucket name>", obj_path)
SV125
  • 273
  • 1
  • 3
  • 13
0

To move an object from one directory to another:

import boto3

def move_s3_object(bucket: str, old_key: str, new_key: str) -> None:
    boto3.resource('s3').Object(bucket,  new_key).copy_from(CopySource=f'{bucket}/{old_key}')
    boto3.client('s3').delete_object(Bucket=bucket, Key=old_key)


# example:
move_s3_object('my_bucket', old_key='tmp/test.txt', new_key='tmp/tmp2/test.txt')

This might even work with two different buckets, but I havent tested that.

-2

This is code I used to move files within sub-directories of a s3 bucket

# =============================================================================
# CODE TO MOVE FILES within subfolders in S3 BUCKET
# =============================================================================

from boto3.session import Session

ACCESS_KEY = 'a_key'
SECRET_KEY = 's_key'
session = Session(aws_access_key_id=ACCESS_KEY,
            aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')#creating session of S3 as resource


s3client = session.client('s3')

resp_dw = s3client.list_objects(Bucket='main_bucket', Prefix='sub_folder/', Delimiter="/")

forms2_dw = [x['Key'] for x in resp_dw['Contents'][1:]]#here we got all files list (max limit is 1000 at a time)
reload_no = 0
while len(forms2_dw) != 0 :

    #resp_dw = s3client.list_objects(Bucket='main_bucket', Prefix='sub_folder/', Delimiter="/")
    #with open('dw_bucket.json','w') as f:
    #    resp_dws =str(resp_dw)
       # f.write(json.dumps(resp_dws))
    #forms_dw = [x['Prefix'] for x in resp_dw['CommonPrefixes']] 
    #forms2_dw = [x['Key'] for x in resp_dw['Contents'][1:]]
    #forms2_dw[-1]
    total_files = len(forms2_dw)
    #i=0
    for i in range(total_files):
    #zip_filename='1819.zip'
        foldername = resp_dw['Contents'][1:][i]['LastModified'].strftime('%Y%m%d')#Put your logic here for folder name
        my_bcket   =  'main_bucket'

        my_file_old = resp_dw['Contents'][1:][i]['Key'] #file to be copied path
        zip_filename =my_file_old.split('/')[-1]
        subpath_nw='new_sub_folder/'+foldername+"/"+zip_filename #destination path
        my_file_new = subpath_nw
        # 
        print str(reload_no)+ ':::  copying from====:'+my_file_old+' to :====='+s3_archive_subpath_nw
        #print my_bcket+'/'+my_file_old 

        if zip_filename[-4:] == '.zip':
            s3.Object(my_bcket,my_file_new).copy_from(CopySource=my_bcket+'/'+my_file_old)
            s3.Object(my_bcket,my_file_old).delete()

            print str(i)+' files moved of '+str(total_files)

    resp_dw = s3client.list_objects(Bucket='main_bucket', Prefix='sub-folder/', Delimiter="/")

    forms2_dw = [x['Key'] for x in resp_dw['Contents'][1:]] 
    reload_no +=1 
Ganesh Kharad
  • 333
  • 2
  • 6
-2

It can be done easily with s3fs library.

import s3fs

src = 'source_bucket'
dst = 'destination_bucket'

s3 = s3fs.S3FileSystem(anon=False,key='aws_s3_key',secret='aws_s3_secret_key')

for i in s3.ls(src,refresh=True): # loading the file names
    if 'file_name' in i:          # checking the file name
        s3.mv(i,dst)              # moving file to destination

Here's the documentation. https://s3fs.readthedocs.io/en/latest/