3

I am trying to move files older than a hour from one s3 bucket to another s3 bucket using python boto3 AWS lambda function with following cases:

  1. Both buckets can be in same account and different region.
  2. Both buckets can be in different account and different region.
  3. Both buckets can be in different account and same region.

I got some help to move files using the python code mentioned by @John Rotenstein

import boto3
from datetime import datetime, timedelta

SOURCE_BUCKET = 'bucket-a'
DESTINATION_BUCKET = 'bucket-b'

s3_client = boto3.client('s3')

# Create a reusable Paginator
paginator = s3_client.get_paginator('list_objects_v2')

# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket=SOURCE_BUCKET)

# Loop through each object, looking for ones older than a given time period
for page in page_iterator:
    for object in page['Contents']:
        if object['LastModified'] < datetime.now().astimezone() - timedelta(hours=1):   # <-- Change time period here
            print(f"Moving {object['Key']}")

            # Copy object
            s3_client.copy_object(
                Bucket=DESTINATION_BUCKET,
                Key=object['Key'],
                CopySource={'Bucket':SOURCE_BUCKET, 'Key':object['Key']}
            )

            # Delete original object
            s3_client.delete_object(Bucket=SOURCE_BUCKET, Key=object['Key'])

How can this be modified to cater the requirement

cloudbud
  • 2,948
  • 5
  • 28
  • 54

2 Answers2

3

Moving between regions

This is a non-issue. You can just copy the object between buckets and Amazon S3 will figure it out.

Moving between accounts

This is a bit harder because the code will use a single set of credentials must have ListBucket and GetObject access on the source bucket, plus PutObject rights to the destination bucket.

Also, if credentials are being used from the Source account, then the copy must be performed with ACL='bucket-owner-full-control' otherwise the Destination account won't have access rights to the object. This is not required when the copy is being performed with credentials from the Destination account.

Let's say that the Lambda code is running in Account-A and is copying an object to Account-B. An IAM Role (Role-A) is assigned to the Lambda function. It's pretty easy to give Role-A access to the buckets in Account-A. However, the Lambda function will need permissions to PutObject in the bucket (Bucket-B) in Account-B. Therefore, you'll need to add a bucket policy to Bucket-B that allows Role-A to PutObject into the bucket. This way, Role-A has permission to read from Bucket-A and write to Bucket-B.

So, putting it all together:

  • Create an IAM Role (Role-A) for the Lambda function
  • Give the role Read/Write access as necessary for buckets in the same account
  • For buckets in other accounts, add a Bucket Policy that grants the necessary access permissions to the IAM Role (Role-A)
  • In the copy_object() command, include ACL='bucket-owner-full-control' (this is the only coding change needed)
  • Don't worry about doing any for cross-region, it should just work automatically
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • also I was going through the post https://stackoverflow.com/questions/43577746/aws-lambda-task-timed-out/43578149, where you have mentioned the timeout value can be 15 min max. But the buckets objects in my case are more than 5 GB so what could be the better solution here AWS fargate? – cloudbud Jun 01 '20 at 10:01
  • hey @John Rotenstein could you let me know this query? – cloudbud Jun 02 '20 at 10:04
  • If the copy operation takes longer than 15 minutes, then Lambda is not appropriate. Copying between regions would also make the operation take longer. To recommend a method, I would need to know more information: How often do files arrive (or how many per hour or day)? Do you need them to be copied quickly, or can they be copied once per day? How does the program determine where to copy the files? (If it is based on directory, then S3 Replication could do it for you automatically.) – John Rotenstein Jun 02 '20 at 10:30
  • hey @John Rotenstein So the s3 bucket will have folders that will have files, the content of these folders should be copied to folder in buckets in other region. So what do you suggest? replication would fit in? what are the pros and cons here. The files would be coming every minute, lets assume 1 minute 1 file, or there can be 1000 files per hour, each file would be 300 MB, what would be the best solution in that case. Also with s3 replication can I delete the content that has been copied to other buckets ? requires encryption,version & copying every 15 minute would incur cost? – cloudbud Jun 02 '20 at 15:01
  • Hi @John Rotenstein: I used the code but now I am getting this error Response: { "errorMessage": "'Contents'", "errorType": "KeyError", "stackTrace": [ " File \"/var/task/lambda_function.py\", line 21, in lambda_handler\n for object in page['Contents']:\n" ] } Although the content got moved but still getting error – cloudbud Jun 28 '20 at 09:32
  • 1
    No idea! You'll have to add some debug statements in that area to see what it is doing. It might be happening when `page` does _not_ contain an element called `Contents`. That might happen at the end (which is why things are being moved), but it still shouldn't fail at that point. You could put `if 'Contents' in page then:` before that line to avoid the situation. – John Rotenstein Jun 28 '20 at 09:37
  • hey @John Rotenstein: even with ACL='bucket-owner-full-control' in destination bucket I am getting access denied when trying to download the file. It gives connonical id as external. Do I need to make any change to lambda function role also – cloudbud Jul 08 '20 at 07:28
  • Which account are you using to perform the download that is failing? Which credentials are you using? What do you mean by "It gives connonical id as external"? – John Rotenstein Jul 08 '20 at 10:03
  • Trying to download from the destination account. I am using console. I was checking the permission of the bucket where it shows the caninocal id as external, do you think that can be the issue? – cloudbud Jul 08 '20 at 10:21
  • It sounds like the files were _not_ copied with `ACL='bucket-owner-full-control'`. – John Rotenstein Jul 08 '20 at 10:27
  • doe this gives the recursive permission also ? – cloudbud Jul 09 '20 at 07:22
  • I'm not sure what you mean. A `copy_object()` command only copies one file. The ACL specified during the copy is applied to the copied object. There is no 'recursive' element to this process. – John Rotenstein Jul 09 '20 at 09:06
  • That is already done. But still object ownership needs to be changed, its of account A, can you have a look at the question – cloudbud Jul 16 '20 at 14:41
  • sorry my question is https://stackoverflow.com/questions/62934125/how-to-copy-the-object-in-s3-from-account-a-to-account-b-with-updated-object-own – cloudbud Jul 16 '20 at 14:42
3

An alternate approach would be to use Amazon S3 Replication, which can replicate bucket contents:

  • Within the same region, or between regions
  • Within the same AWS Account, or between different Accounts

Replication is frequently used when organizations need another copy of their data in a different region, or simply for backup purposes. For example, critical company information can be replicated to another AWS Account that is not accessible to normal users. This way, if some data was deleted, there is another copy of it elsewhere.

Replication requires versioning to be activated on both the source and destination buckets. If you require encryption, use standard Amazon S3 encryption options. The data will also be encrypted during transit.

You configure a source bucket and a destination bucket, then specify which objects to replicate by providing a prefix or a tag. Objects will only be replicated once Replication is activated. Existing objects will not be copied. Deletion is intentionally not replicated to avoid malicious actions. See: What Does Amazon S3 Replicate?

There is no "additional" cost for S3 replication, but you will still be charge for any Data Transfer charges when moving objects between regions, and for API Requests (that are tiny charges), plus storage of course.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • sounds perfect. – cloudbud Jun 03 '20 at 06:20
  • can I increase the replication time to 1 hour from 15 min? – cloudbud Jun 03 '20 at 07:26
  • 1
    The "replication time" is automatic. In situations where you need additional control over replication time, you can use the Replication Time Control feature. See: [S3 Replication Update: Replication SLA, Metrics, and Events | AWS News Blog](https://aws.amazon.com/blogs/aws/s3-replication-update-replication-sla-metrics-and-events/) – John Rotenstein Jun 03 '20 at 08:20
  • I could not find if I can control the replication frequency. I raised a support request too, they said it is not possible. Could you please help me here – cloudbud Jun 04 '20 at 06:15
  • 1
    Replication is automatic and continuous. It is not a frequency, it's more like "it's in the queue, it'll take a few minutes". – John Rotenstein Jun 04 '20 at 07:11