143

I have been on the lookout for a tool to help me copy content of an AWS S3 bucket into a second AWS S3 bucket without downloading the content first to the local file system.

I have tried to use the AWS S3 console copy option but that resulted in some nested files being missing.

I have tried to use Transmit app (by Panic). The duplicate command downloads the files first to the local system then uploads them back to the second bucket, which quite inefficient.

cnikolaou
  • 3,782
  • 4
  • 25
  • 32
  • Consider increasing your concurrent request count `aws configure set default.s3.max_concurrent_requests 200` See this post for more details and options http://stackoverflow.com/questions/4663016/faster-s3-bucket-duplication – Balmipour Apr 06 '17 at 08:51

21 Answers21

215

Copy between S3 Buckets

AWS (just recently) released a command line interface for copying between buckets.

http://aws.amazon.com/cli/

$ aws s3 sync s3://mybucket-src s3://mybucket-target --exclude *.tmp
..

This will copy from one target bucket to another bucket.

See the documentation here : S3 CLI Documentation

Layke
  • 51,422
  • 11
  • 85
  • 111
  • Ran it from EC2 and got 80MB copied across in about 5s. – Stew-au Nov 28 '13 at 12:58
  • 1
    Exactly what I needed, since aws-sdk gem has no feature for copying or syncing a whole bucket at once. Thanks! – odigity Apr 03 '14 at 16:54
  • It throws the following error `A client error (PermanentRedirect) occurred when calling the ListObjects operation: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.` – Giovanni Bitliner May 07 '14 at 16:41
  • @GiovanniBitliner The bucket name you are using is incorrect. You are either using the wrong prefix, or using the old way of referring to the bucket. Check your bucket name exactly in your admin console. – Layke May 08 '14 at 09:23
  • The buckets could be in different S3 regions. I will add an answer showing how to copy between S3 buckets in different regions. – Adam Gawne-Cain Nov 02 '15 at 15:55
  • 12
    Note if this is your first time using the cli tool you need to run 'aws configure' and enter your creds – S.. Mar 14 '16 at 08:57
  • Note AWS offers two CLI type tools. The 'AWS CLI' and 'AWS Tools for Powershell' . This answer uses 'AWS CLI' . Don't be like me and install the wrong one. – fishjd Sep 26 '17 at 19:52
  • The biggest problems comes related to this ticket https://github.com/aws/aws-cli/issues/901 aws s3 sync console will not manage to copy each objects ACL config. You have to find a way to make https://github.com/cobbzilla/s3s3mirror/ -C option work and/or set a bucket policy that would mirror the ACL in the source bucket objects. Also you can give it a try and specify tags for each object together with a bucket policy that will have an explicit condition on matching the tag for granting read access, for example. – Djonatan Mar 20 '19 at 14:51
  • Amazon provides AWS CLI, a command line tool for interacting with AWS. With AWS CLI, that entire process took less than three seconds: $ aws s3 sync s3:/// For example aws s3 sync s3://s3.aws-cli.demo/photos/office ~/Pictures/work – Tapan Banker Nov 03 '19 at 21:35
  • Is there a way copy the files to destination root? Not the subfolders? – Asif Mushtaq Apr 01 '23 at 20:45
  • You try and add replication rule from the console and it asks you to turn versioning on. This just works and is way simple. – dustbuster Jun 09 '23 at 20:52
49

You can now do it from the S3 admin interface. Just go into one bucket select all your folders actions->copy. Then move into your new bucket actions->paste.

KDEx
  • 3,505
  • 4
  • 31
  • 39
  • 4
    Awesome! He is referring to the web interface. Unlike most of the others, I could do this from an iPad. – Jacob Foshee Dec 08 '14 at 03:09
  • 2
    This randomly leaves out nested objects in subfolders - 3 years later and AWS still cannot fix such a basic bug! – RunLoop Aug 01 '16 at 11:15
  • is it for same regions or all? – hakki Apr 06 '17 at 00:49
  • Another downside is it also limits the number of objects you can copy to 100. If you try to use pagination and copy more, it removes the original set of objects from its "clipboard". – paul Apr 13 '17 at 12:06
  • I'm also getting silently missing objects on a large nested paste. No errors or warnings or in process operations shown in the S3 dashboard. – Taylor D. Edmiston Aug 01 '17 at 02:55
  • I have created a new bucket and in its actions "paste" options remains disable even though i have selected "copy" from actions of previous bucket. Could you please help me here? – Vishal Dec 11 '17 at 10:51
  • I can confirm this is not reliable for big copies. Tried copying a folder with ~ 1k subfolders, only 5 subfolders were actually copied, and the operation didn't show any error or warning. – MetalElf0 May 30 '18 at 12:07
  • 1
    Are these issues documented anywhere by Amazon? @RunLoop – davetapley Jun 12 '18 at 23:06
  • 1
    @dukedave I don't know and have not tested again in quite a while as I resorted to doing the copying via the command line as that worked perfectly. – RunLoop Jun 13 '18 at 04:51
  • This worked perfectly for me with 2400 objects in many folders and subfolders. The command to copy is located under the "Actions" button. – lflier Aug 28 '23 at 16:19
45

A simplified example using the aws-sdk gem:

AWS.config(:access_key_id => '...', :secret_access_key => '...')
s3 = AWS::S3.new
s3.buckets['bucket-name'].objects['source-key'].copy_to('target-key')

If you want to perform the copy between different buckets, then specify the target bucket name:

s3.buckets['bucket-name'].objects['source-key'].copy_to('target-key', :bucket_name => 'target-bucket')
Trevor Rowe
  • 6,499
  • 2
  • 27
  • 35
12

Copy between buckets in different regions

$ aws s3 cp s3://src_bucket/file  s3://dst_bucket/file --source-region eu-west-1 --region ap-northeast-1

The above command copies a file from a bucket in Europe (eu-west-1) to Japan (ap-northeast-1). You can get the code name for your bucket's region with this command:

$ aws s3api get-bucket-location --bucket my_bucket

By the way, using Copy and Paste in the S3 web console is easy, but it seems to download from the source bucket into the browser, and then upload to the destination bucket. Using "aws s3" was much faster for me.

Community
  • 1
  • 1
Adam Gawne-Cain
  • 1,347
  • 14
  • 14
9

It's possible with recent aws-sdk gem, see the code sample:

require 'aws-sdk'

AWS.config(
  :access_key_id     => '***',
  :secret_access_key => '***',
  :max_retries       => 10
)

file     = 'test_file.rb'
bucket_0 = {:name => 'bucket_from', :endpoint => 's3-eu-west-1.amazonaws.com'}
bucket_1 = {:name => 'bucket_to',   :endpoint => 's3.amazonaws.com'}

s3_interface_from = AWS::S3.new(:s3_endpoint => bucket_0[:endpoint])
bucket_from       = s3_interface_from.buckets[bucket_0[:name]]
bucket_from.objects[file].write(open(file))

s3_interface_to   = AWS::S3.new(:s3_endpoint => bucket_1[:endpoint])
bucket_to         = s3_interface_to.buckets[bucket_1[:name]]
bucket_to.objects[file].copy_from(file, {:bucket => bucket_from})

more details: How to copy file across buckets using aws-s3 gem

Community
  • 1
  • 1
Anatoly
  • 15,298
  • 5
  • 53
  • 77
6

I have created a Docker executable of s3s3mirror tool. A utility to copy and mirror from an AWS S3 bucket to another.

It is threaded allowing parallel COPY and very memory efficient, it succeeds where s3cmd completely fails.

Usage:

docker run -e AWS_ACCESS_KEY_ID=FOO -e AWS_SECRET_ACCESS_KEY=BAR pmoust/s3s3mirror [OPTIONS] source_bucket[/prefix] dest_bucket[/prefix]

For a full list of options try:

docker run pmoust/s3s3mirror 
5

I'd imagine you've probably found a good solution by now, but for others who are encountering this problem (as I was just recently), I've crafted a simple utility specifically for the purpose of mirroring one S3 bucket to another in a highly concurrent, yet CPU and memory efficient manner.

It's on github under an Apache License here: https://github.com/cobbzilla/s3s3mirror

When you have a very large bucket and are looking for maximum performance, it might be worth trying.

If you decide to give it a try please let me know if you have any feedback.

cobbzilla
  • 1,920
  • 1
  • 16
  • 17
  • I had a great experience with s3s3mirror. I was able to set it up on a m1.small EC2 node and copy 1.5 million objects in about 2 hours. Setup was a little tough, due to my unfamiliarity with Maven and Java, but it only took a few apt-get commands on Ubuntu to get everything installed. One last note: If (like me) you're worried about running an unknown script on a big, important s3 bucket, create a special user with read-only access on the copy-from bucket and use those credentials. Zero chance of accidental deletion. – Micah Jun 18 '13 at 12:46
5

from AWS cli https://aws.amazon.com/cli/ you could do

aws s3 ls - This will list all the S3 buckets

aws cp --recursive s3://<source bucket> s3://<destination bucket> - This will copy the files from one bucket to another

Note* Very useful when creating cross region replication buckets, by doing the above, you files are all tracked and an update to the source region file will be propagated to the replicated bucket. Everything but the file deletions are synced.

For CRR make sure you have versioning enabled on the buckets.

vredrav
  • 61
  • 1
  • 3
5

Checkout the documentation below. I guess thats what you are looking for. http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTObjectCOPY.html

RightAws gem's S3Interface has a copy functions which does the above.

http://rubydoc.info/gems/right_aws/3.0.0/RightAws/S3Interface#copy-instance_method

Josnidhin
  • 12,469
  • 9
  • 42
  • 61
4

If you are in shell and want to copy multiple files but not all files: s3cmd cp --recursive s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]

3

I wrote a script that backs up an S3 bucket: https://github.com/roseperrone/aws-backup-rake-task

#!/usr/bin/env python
from boto.s3.connection import S3Connection
import re
import datetime
import sys
import time

def main():
    s3_ID = sys.argv[1]
    s3_key = sys.argv[2]
    src_bucket_name = sys.argv[3]
    num_backup_buckets = sys.argv[4]
    connection = S3Connection(s3_ID, s3_key)
    delete_oldest_backup_buckets(connection, num_backup_buckets)
    backup(connection, src_bucket_name)

def delete_oldest_backup_buckets(connection, num_backup_buckets):
    """Deletes the oldest backup buckets such that only the newest NUM_BACKUP_BUCKETS - 1 buckets remain."""
    buckets = connection.get_all_buckets() # returns a list of bucket objects
    num_buckets = len(buckets)

    backup_bucket_names = []
    for bucket in buckets:
        if (re.search('backup-' + r'\d{4}-\d{2}-\d{2}' , bucket.name)):
            backup_bucket_names.append(bucket.name)

    backup_bucket_names.sort(key=lambda x: datetime.datetime.strptime(x[len('backup-'):17], '%Y-%m-%d').date())

    # The buckets are sorted latest to earliest, so we want to keep the last NUM_BACKUP_BUCKETS - 1
    delete = len(backup_bucket_names) - (int(num_backup_buckets) - 1)
    if delete <= 0:
        return

    for i in range(0, delete):
        print 'Deleting the backup bucket, ' + backup_bucket_names[i]
        connection.delete_bucket(backup_bucket_names[i])

def backup(connection, src_bucket_name):
    now = datetime.datetime.now()
    # the month and day must be zero-filled
    new_backup_bucket_name = 'backup-' + str('%02d' % now.year) + '-' + str('%02d' % now.month) + '-' + str(now.day);
    print "Creating new bucket " + new_backup_bucket_name
    new_backup_bucket = connection.create_bucket(new_backup_bucket_name)
    copy_bucket(src_bucket_name, new_backup_bucket_name, connection)


def copy_bucket(src_bucket_name, dst_bucket_name, connection, maximum_keys = 100):
    src_bucket = connection.get_bucket(src_bucket_name);
    dst_bucket = connection.get_bucket(dst_bucket_name);

    result_marker = ''
    while True:
        keys = src_bucket.get_all_keys(max_keys = maximum_keys, marker = result_marker)

        for k in keys:
            print 'Copying ' + k.key + ' from ' + src_bucket_name + ' to ' + dst_bucket_name

            t0 = time.clock()
            dst_bucket.copy_key(k.key, src_bucket_name, k.key)
            print time.clock() - t0, ' seconds'

        if len(keys) < maximum_keys:
            print 'Done backing up.'
            break

        result_marker = keys[maximum_keys - 1].key

if  __name__ =='__main__':main()

I use this in a rake task (for a Rails app):

desc "Back up a file onto S3"
task :backup do
     S3ID = "AKIAJM3NRWC7STXWUWVQ"
     S3KEY = "0A5kuzV+E1dkaPjZxHQAezz1GlSddJd0iS5sNpry"
     SRCBUCKET = "primary-mzgd"
     NUM_BACKUP_BUCKETS = 2

     Dir.chdir("#{Rails.root}/lib/tasks")
     system "./do_backup.py #{S3ID} #{S3KEY} #{SRCBUCKET} #{NUM_BACKUP_BUCKETS}"
end
Rose Perrone
  • 61,572
  • 58
  • 208
  • 243
3

To copy from one S3 bucket to same or another S3 bucket without downloading to local, its pretty simple. Use the below shell command.

hdfs dfs -cp -f "s3://AccessKey:SecurityKey@ExternalBucket/SourceFoldername/*.*" "s3://AccessKey:SecurityKey@ExternalBucket/TargetFoldername"

This will copy all the files from the source bucket's SourceFoldername folder to target bucket's TargetFoldername folder. In the above code, please replace AccessKey,SecurityKey and ExternalBucket with your corresponding values.

Sarath Subramanian
  • 20,027
  • 11
  • 82
  • 86
1

I hear there's a node module for that if you're into javascript :p

From the knox-copy docs:

knoxCopy = require 'knox-copy'

client = knoxCopy.createClient
  key: '<api-key-here>'
  secret: '<secret-here>'
  bucket: 'backups'

client.copyBucket
  fromBucket: 'uploads'
  fromPrefix: '/nom-nom'
  toPrefix: "/upload_backups/#{new Date().toISOString()}"
  (err, count) ->
     console.log "Copied #{count} files"
hurrymaplelad
  • 26,645
  • 10
  • 56
  • 76
1

I was informed that you can also do this using s3distcp on an EMR cluster. It is supposed to be faster for data containing large files. It works well enough on small sets of data - but I would have preferred another solution given the learning curve it took to set up for so little data (I've never worked with EMR before).

Here's a link from the AWS Documentation: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html

Update: For the same data set, s3s3mirror was much faster than s3distcp or the AWS cli. Much easier to set up, too.

curious_george
  • 622
  • 2
  • 8
  • 19
1

As Neel Bhaat has explained in this blog, there are many different tools that can be used for this purpose. Some are AWS provided, where most are third party tools. All these tools require you to save your AWS account key and secret in the tool itself. Be very cautious when using third party tools, as the credentials you save in might cost you, your entire worth and drop you dead.

Therefore, I always recommend using the AWS CLI for this purpose. You can simply install this from this link. Next, run the following command and save your key, secret values in AWS CLI.

aws configure

And use the following command to sync your AWS S3 Bucket to your local machine. (The local machine should have AWS CLI installed)

aws s3 sync <source> <destination>

Examples:

1) For AWS S3 to Local Storage

aws s3 sync <S3Uri> <LocalPath>

2) From Local Storage to AWS S3

aws s3 sync <LocalPath> <S3Uri>

3) From AWS s3 bucket to another bucket

aws s3 sync <S3Uri> <S3Uri> 
Keet Sugathadasa
  • 11,595
  • 6
  • 65
  • 80
1

As of 2020 if you are using s3cmd you can copy a folder from bucket1 to bucket2 using the following command

s3cmd cp --recursive s3://bucket1/folder_name/ s3://bucket2/folder_name/

--recursive is necessary to recursively copy everything in the folder, also note that you have to specify "/" after the folder name, otherwise it will fail.

medBouzid
  • 7,484
  • 10
  • 56
  • 86
0

How about aws s3 sync cli command. aws s3 sync s3://bucket1/ s3://bucket2/

0

The best way to copy S3 bucket is using the AWS CLI.

It involves these 3 steps:

  1. Installing AWS CLI on your server.
**https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html**
  1. If you are copying buckets between two AWS accounts, you need to attach correct policy with each bucket.

  2. After this use this command to copy from one bucket to another.

aws s3 sync s3://sourcebucket s3://destinationbucket

The details of step 2 and step 3 are given in this link:

https://aws.amazon.com/premiumsupport/knowledge-center/account-transfer-s3/

0

You can write a Java App - maybe even a GUI SWING App that uses the AWS Java APIs To copy objects see -

https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/javav2/example_code/s3/src/main/java/com/example/s3/CopyObject.java

smac2020
  • 9,637
  • 4
  • 24
  • 38
0

Adding Copying objects across AWS accounts using S3 Batch Operations because it hasn't been mentioned here yet. This is the method I'm currently trying out because I have about 1 million objects I need to move to a new account, and cp and sync don't work for me because of expirations of some token, and I don't have a way to figure out what token it is, as my general access token is working just fine.

Helgi
  • 94
  • 4
0

There are two meaning of your question.

  1. you want to copy one AWS s3 account to another AWS S3 account. For this you can use the command line interface to copy data from one account to another account

aws s3 sync s3://source.bucket s3://destination.bucket --source-region source.region --region destination.region

Replace souce.bucket name with your existing bucket name from where you want to copy. In destination.bucket put the name of another bucket name where you want to get pasted the data. Then put the source region & then put the destination region after both arguments.

  1. If you want to copy and paste in same AWS account between two different S3 bucket then

Go to the S3 bucket from where you want to copy the data. Click on check box to select all data or selected folder then go to the action tab expand the tab and click on copy. It will open a new portal where you will be asked for Destination then go to Browse S3 in this tab by default your current S3 bucket will get opened go to the S3 bucket to get your desired destination and then click on your destination bucket. All selected data will get start coping.

There are one more option is here.

You can use s3 browser to get you data copied one S3 bucket to another S3 bucket.

Go to the google and download S3 browser. Open S3 browser click on Accounts > add new account > fill the details >display name > account type > Access key Id > Secret Access Key then click on Add new Account.

you selected AWS account all S3 will be visible there. you can add multiple account and switch from one to another

But I will suggest you to use AWS CLI to get fast result