Is it possible to copy all files from one S3 bucket to another with s3cmd?

Question

I'm pretty happy with s3cmd, but there is one issue: How to copy all files from one S3 bucket to another? Is it even possible?

EDIT: I've found a way to copy files between buckets using Python with boto:

from boto.s3.connection import S3Connection

def copyBucket(srcBucketName, dstBucketName, maxKeys = 100):
  conn = S3Connection(awsAccessKey, awsSecretKey)

  srcBucket = conn.get_bucket(srcBucketName);
  dstBucket = conn.get_bucket(dstBucketName);

  resultMarker = ''
  while True:
    keys = srcBucket.get_all_keys(max_keys = maxKeys, marker = resultMarker)

    for k in keys:
      print 'Copying ' + k.key + ' from ' + srcBucketName + ' to ' + dstBucketName

      t0 = time.clock()
      dstBucket.copy_key(k.key, srcBucketName, k.key)
      print time.clock() - t0, ' seconds'

    if len(keys) < maxKeys:
      print 'Done'
      break

    resultMarker = keys[maxKeys - 1].key

Syncing is almost as straight forward as copying. There are fields for ETag, size, and last-modified available for keys.

Maybe this helps others as well.

Hey, could you make your edit into an answer and accept? This is a really useful tip! — Hamish, Mar 04 '12 at 21:12
any reason you are using 'get_all_keys' as opposed to 'list'? — Bill Rosmus, Jul 02 '13 at 17:13

score 101 · Accepted Answer · edited Jun 18 '14 at 03:42

101

s3cmd sync s3://from/this/bucket/ s3://to/this/bucket/

For available options, please use: $s3cmd --help

edited Jun 18 '14 at 03:42

Axel Advento

2,995
3
24
32

answered Mar 19 '13 at 15:50

amit_saxena

7,450
5
49
64

1

Awesome suggestion. Love that s3cmd. Trailing slashes may be important so `s3cmd sync s3://sample_bucket/ s3://staging_bucket/` worked well for me. – Charles Forcey Apr 18 '13 at 18:12
I also don't like this behavior. The reviewers try to minimize time spent on a review, thus your change need not only to be okay, but it needs to look so. If your change was rejected, but you are very, very sure that it was really needed, I don't consider bad behavior if you give it a next try - maybe with other reviewers you will have more luck. – peterh Jun 18 '14 at 03:48
13

you can also use aws cli to do this. aws s3 sync s3://from/ s3://to/ – Bobo Aug 19 '14 at 19:31
2

What if each bucket has a different set of access-key-id and secret (different AWS accounts)? – brainstorm Feb 04 '15 at 09:44
@brainstorm you may want to create a new AWS user which has access on both the buckets, for using s3cmd for the specific use case. – amit_saxena Feb 04 '15 at 10:16
Do it using `--check-md5` which will check MD5 sums when comparing files for sync. – imVJ Oct 18 '16 at 09:45
A quick question, does this sync files directly between buckets without copying them locally first? Assuming so, but thought to double check.... – Brett Apr 05 '17 at 09:16
Yes, I think it is a direct copy and no local storage is involved. Amazon does have a copy API for S3, which probably is used by s3cmd: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html – amit_saxena Apr 07 '17 at 09:18

score 50 · Answer 2 · answered May 10 '14 at 05:55

50

AWS CLI seems to do the job perfectly, and has the bonus of being an officially supported tool.

aws s3 sync s3://mybucket s3://backup-mybucket

http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

answered May 10 '14 at 05:55

pythonjsgeo

5,122
2
34
47

Yeap. But there is no requester-pays parameter and so on – Artem Aug 07 '17 at 14:44

mdahlman · Answer 3 · 2013-12-19T18:49:06.560

The answer with the most upvotes as I write this is this one:

s3cmd sync s3://from/this/bucket s3://to/this/bucket

It's a useful answer. But sometimes sync is not what you need (it deletes files, etc.). It took me a long time to figure out this non-scripting alternative to simply copy multiple files between buckets. (OK, in the case shown below it's not between buckets. It's between not-really-folders, but it works between buckets equally well.)

# Slightly verbose, slightly unintuitive, very useful:
s3cmd cp --recursive --exclude=* --include=file_prefix* s3://semarchy-inc/source1/ s3://semarchy-inc/target/

Explanation of the above command:

–recursive
In my mind, my requirement is not recursive. I simply want multiple files. But recursive in this context just tells s3cmd cp to handle multiple files. Great.
–exclude
It’s an odd way to think of the problem. Begin by recursively selecting all files. Next, exclude all files. Wait, what?
–include
Now we’re talking. Indicate the file prefix (or suffix or whatever pattern) that you want to include.
s3://sourceBucket/ s3://targetBucket/
This part is intuitive enough. Though technically it seems to violate the documented example from s3cmd help which indicates that a source object must be specified:
s3cmd cp s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]

To make this good answer great, please copy the 'Enlightenment' section of your [in-depth blog post](http://mdahlman.wordpress.com/2013/12/05/copy-files-between-s3-buckets/) into your answer here. Great work! — Iain Samuel McLean Elder, Dec 09 '13 at 11:38
Couldn't you achieve the same with: `s3cmd sync --max-delete=0 s3://from s3://to` ? — schmijos, Jan 06 '15 at 14:33
Hmm... I never found that option. So I can't confirm that it works. But I don't see why it wouldn't. In fact, now I see `--no-delete-removed` which seems even more to the point. — mdahlman, Jan 07 '15 at 05:01

score 10 · Answer 4 · answered Mar 30 '16 at 14:36

You can also use the web interface to do so:

Go to the source bucket in the web interface.
Mark the files you want to copy (use shift and mouse clicks to mark several).
Press Actions->Copy.
Go to the destination bucket.
Press Actions->Paste.

That's it.

score 8 · Answer 5 · answered Dec 18 '12 at 20:18

8

I needed to copy a very large bucket so I adapted the code in the question into a multi threaded version and put it up on GitHub.

https://github.com/paultuckey/s3-bucket-to-bucket-copy-py

answered Dec 18 '12 at 20:18

Paul

548
5
9

score 3 · Answer 6 · answered Feb 06 '13 at 18:46

It's actually possible. This worked for me:

import boto


AWS_ACCESS_KEY = 'Your access key'
AWS_SECRET_KEY = 'Your secret key'

conn = boto.s3.connection.S3Connection(AWS_ACCESS_KEY, AWS_SECRET_KEY)
bucket = boto.s3.bucket.Bucket(conn, SRC_BUCKET_NAME)

for item in bucket:
    # Note: here you can put also a path inside the DEST_BUCKET_NAME,
    # if you want your item to be stored inside a folder, like this:
    # bucket.copy(DEST_BUCKET_NAME, '%s/%s' % (folder_name, item.key))
    bucket.copy(DEST_BUCKET_NAME, item.key)

The copy method is for the `boto.s3.key` object, [see here](http://boto.readthedocs.org/en/latest/ref/s3.html#module-boto.s3.key). But this is a good way to directly copy/move a file without worrying about details with *'subfolders'*. — GeoSharp, Nov 13 '15 at 07:40

score 2 · Answer 7 · answered Aug 14 '12 at 09:55

Thanks - I use a slightly modified version, where I only copy files that don't exist or are a different size, and check on the destination if the key exists in the source. I found this a bit quicker for readying the test environment:

def botoSyncPath(path):
    """
       Sync keys in specified path from source bucket to target bucket.
    """
    try:
        conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
        srcBucket = conn.get_bucket(AWS_SRC_BUCKET)
        destBucket = conn.get_bucket(AWS_DEST_BUCKET)
        for key in srcBucket.list(path):
            destKey = destBucket.get_key(key.name)
            if not destKey or destKey.size != key.size:
                key.copy(AWS_DEST_BUCKET, key.name)

        for key in destBucket.list(path):
            srcKey = srcBucket.get_key(key.name)
            if not srcKey:
                key.delete()
    except:
        return False
    return True

score 2 · Answer 8 · edited Apr 16 '13 at 11:23

I wrote a script that backs up an S3 bucket: https://github.com/roseperrone/aws-backup-rake-task

#!/usr/bin/env python
from boto.s3.connection import S3Connection
import re
import datetime
import sys
import time

def main():
    s3_ID = sys.argv[1]
    s3_key = sys.argv[2]
    src_bucket_name = sys.argv[3]
    num_backup_buckets = sys.argv[4]
    connection = S3Connection(s3_ID, s3_key)
    delete_oldest_backup_buckets(connection, num_backup_buckets)
    backup(connection, src_bucket_name)

def delete_oldest_backup_buckets(connection, num_backup_buckets):
    """Deletes the oldest backup buckets such that only the newest NUM_BACKUP_BUCKETS - 1 buckets remain."""
    buckets = connection.get_all_buckets() # returns a list of bucket objects
    num_buckets = len(buckets)

    backup_bucket_names = []
    for bucket in buckets:
        if (re.search('backup-' + r'\d{4}-\d{2}-\d{2}' , bucket.name)):
            backup_bucket_names.append(bucket.name)

    backup_bucket_names.sort(key=lambda x: datetime.datetime.strptime(x[len('backup-'):17], '%Y-%m-%d').date())

    # The buckets are sorted latest to earliest, so we want to keep the last NUM_BACKUP_BUCKETS - 1
    delete = len(backup_bucket_names) - (int(num_backup_buckets) - 1)
    if delete <= 0:
        return

    for i in range(0, delete):
        print 'Deleting the backup bucket, ' + backup_bucket_names[i]
        connection.delete_bucket(backup_bucket_names[i])

def backup(connection, src_bucket_name):
    now = datetime.datetime.now()
    # the month and day must be zero-filled
    new_backup_bucket_name = 'backup-' + str('%02d' % now.year) + '-' + str('%02d' % now.month) + '-' + str(now.day);
    print "Creating new bucket " + new_backup_bucket_name
    new_backup_bucket = connection.create_bucket(new_backup_bucket_name)
    copy_bucket(src_bucket_name, new_backup_bucket_name, connection)


def copy_bucket(src_bucket_name, dst_bucket_name, connection, maximum_keys = 100):
    src_bucket = connection.get_bucket(src_bucket_name);
    dst_bucket = connection.get_bucket(dst_bucket_name);

    result_marker = ''
    while True:
        keys = src_bucket.get_all_keys(max_keys = maximum_keys, marker = result_marker)

        for k in keys:
            print 'Copying ' + k.key + ' from ' + src_bucket_name + ' to ' + dst_bucket_name

            t0 = time.clock()
            dst_bucket.copy_key(k.key, src_bucket_name, k.key)
            print time.clock() - t0, ' seconds'

        if len(keys) < maximum_keys:
            print 'Done backing up.'
            break

        result_marker = keys[maximum_keys - 1].key

if  __name__ =='__main__':main()

I use this in a rake task (for a Rails app):

desc "Back up a file onto S3"
task :backup do
     S3ID = "*****"
     S3KEY = "*****"
     SRCBUCKET = "primary-mzgd"
     NUM_BACKUP_BUCKETS = 2

     Dir.chdir("#{Rails.root}/lib/tasks")
     system "./do_backup.py #{S3ID} #{S3KEY} #{SRCBUCKET} #{NUM_BACKUP_BUCKETS}"
end

score 2 · Answer 9 · answered Dec 16 '15 at 02:27

2

mdahlman's code didn't work for me but this command copies all the files in the bucket1 to a new folder (command also creates this new folder) in bucket 2.

cp --recursive --include=file_prefix* s3://bucket1/ s3://bucket2/new_folder_name/

answered Dec 16 '15 at 02:27

ansonw

1,559
1
16
22

2

this command does not work, what should go before cp? aw3 s3cmd or anything else – in_user Sep 19 '19 at 14:19

score 1 · Answer 10 · answered Mar 06 '14 at 13:09

You can also use s3funnel which uses multi-threading:

https://github.com/neelakanta/s3funnel

example (without the access key or secret key parameters shown):

s3funnel source-bucket-name list | s3funnel dest-bucket-name copy --source-bucket source-bucket-name --threads=10

score 1 · Answer 11 · answered Mar 06 '11 at 16:00

1

s3cmd won't cp with only prefixes or wildcards but you can script the behavior with 's3cmd ls sourceBucket', and awk to extract the object name. Then use 's3cmd cp sourceBucket/name destBucket' to copy each object name in the list.

I use these batch files in a DOS box on Windows:

s3list.bat

s3cmd ls %1 | gawk "/s3/{ print \"\\"\"\"substr($0,index($0,\"s3://\"))\"\\"\"\"; }"

s3copy.bat

@for /F "delims=" %%s in ('s3list %1') do @s3cmd cp %%s %2

answered Mar 06 '11 at 16:00

John Lemberger

2,689
26
25

Note that this method is VERY slow (like other solutions that do one object at a time) -- but it does work if you don't have too many items to copy. – Joshua Richardson Oct 25 '13 at 17:10
This answer fooled me for a long time... but in fact s3cmd CAN cp with wildcards if you use the correct (somewhat unintuitive) set of options. I posted an answer with details. – mdahlman Dec 05 '13 at 21:59

Is it possible to copy all files from one S3 bucket to another with s3cmd?

11 Answers11

Linked