0

I'm working on a small project where I have readonly access to one s3 bucket and my job is to copy the data from this s3 bucket to my own s3 bucket whenever my code runs.

My approach is to get the metadata of the readonly s3 bucket and get the objects in order sorted by date and I will keep track of last copied file and copy only those which are not in my s3 bucket.

I've already referred some solutions available on stack overflow like this one: How list Amazon S3 bucket contents by modified date?

But the problem is that I'm using Java and cannot use aws s3api.

Another solution I found here(https://www.quora.com/How-do-I-filter-files-in-an-S3-bucket-folder-in-AWS-based-on-date-using-boto) was to use timestamps in file name itself and then get the data from s3 based on last copied file name. But as I only have readonly access, I cannot do this.

Anyone has any idea about how to achieve this? Any solution will be very helpful. I can use any aws service if a solution exists.

Thanks in advance! :)


Edit: As pointed out by @Marcin in comments, I cannot trigger anything like PUT event on the readonly s3 bucket. I can only read the data.

Nobody
  • 1
  • 2
  • Can notifications be set for the bucket for PUT events. If yes, they you just trigger lambda for the newly added objects and copy them where you want. – Marcin Mar 11 '20 at 13:14
  • @Marcin No I have no access to that s3 bucket except reading the data. – Nobody Mar 11 '20 at 13:16
  • In that case, you have to "manually" track them. I think your approach is a reasonable in this scenario. – Marcin Mar 11 '20 at 13:25

1 Answers1

0

You could sync the buckets, to prevent working out which objects to copy. For example using awscli:

aws s3 sync s3://frombucket s3://tobucket
jarmod
  • 71,565
  • 16
  • 115
  • 122
  • Is there a Java version of doing this? I cannot use cli here. – Nobody Mar 11 '20 at 13:35
  • No, sync is a higher-level feature of the awscli. If using Java, you could simply enumerate the objects in the two buckets using ListObjectsV2, then remove items that are in both lists, and then copy the remaining files. You could perhaps also consider having the owner of the source bucket implement S3 bucket replication. – jarmod Mar 11 '20 at 14:05