1

I need to pull data published to an S3 bucket by a different organization (therefore a different AWS account) in a different region, for subsequent processing with Lambda. I do have access to read it but cannot ask them to set up replication to my buckets.

Amazon's Cross-Region Replication looks like it's designed for pushing data from the source and I'm not even sure the source organization has versioning enabled.

Is there a way to pull data? My need is for one-way only; I need to process that data shortly (within 10 minutes or so) after it arrives in the source S3 bucket.

wishihadabettername
  • 14,231
  • 21
  • 68
  • 85
  • A cron-job that runs `aws s3 sync` every 10 minutes? Something like that is going to be the best way to pull from an S3 bucket I think, if you can't get new object events sent to you from that bucket. – Mark B Jan 14 '19 at 15:35
  • Is there a way to run this as a lambda? I'm thinking of the cost of running an EC2 instance just to run the sync. Thanks. – wishihadabettername Jan 14 '19 at 15:40

1 Answers1

2

You could run aws s3 sync on a schedule, like every 10 minutes. If you want to run this in a AWS Lambda function, it looks like NodeJS and Python Lambda environments have the AWS CLI tool pre-installed. I would suggest writing a short Python Lambda function that calls the AWS CLI took to run an s3 sync command, and schedule that Lambda function to run every 10 minutes.

Mark B
  • 183,023
  • 24
  • 297
  • 295