0

I'm copying files from an external companies bucket, they've sent me an access key/secret that I've set up as an env variable. I want to be able to copy objects from their bucket, I've used the below but that's for moving objects with the same connection, how do I use S3Hook to copy objects w. a different conn id?

    s3 = S3Hook(self.aws_conn_id)
    s3_conn = s3.get_conn()

    ext_s3 = S3Hook(self.ext_aws_conn_id)
    ext_s3 conn = ext_s3.get_conn()

    #this moves objects w. the same connection...
    s3_conn.copy_object(Bucket="bucket",
                        Key=f'dest_key',
                        CopySource={
                            'Bucket': self.partition.bucket,
                            'Key': key
                            }, ContentEncoding='csv')
KristiLuna
  • 1,601
  • 2
  • 18
  • 52

1 Answers1

0

From my point of view this is impossible. First of all, you can only declare one URL endpoint.

Secondly, Airflow S3Hook work with Boto3 in its background, and probably, both of your connections will have different acces_key and secret_key to create the boto3 resource/client. As explained in this post, if you wish to copy between different buckets, then you will need to use a single set of credentials that have:

  • GetObject permission on the source bucket
  • PutObject permission on the destination bucket

Again in the S3Hook, you can only declare a single set of credentials. You could maybe use the credentials given by your client and declare a bucket in your account with PutObject permission, but this will imply that you are allowed to do this in your enterprise (not very wise in terms of security), and even though your S3Hook will still only reference to one single endpoint.

To sum up, everything I have been dealing with the same problem and ended up creating two S3 connections using the first one for downloading from the original bucket and the second to upload to my enterprise bucket.

Lucas M. Uriarte
  • 2,403
  • 5
  • 19
  • Thank you for responding. In my sample I have the ext_s3 which is the external S3Hook connection, but you're saying I can't use this to download the object and then use the s3 variable I made to load the object? Do you have an example of how you ended up solving this? – KristiLuna Sep 26 '22 at 13:54
  • My answer is saying that you cannot copy object between two different buckets in different accounts using the same S3Hook connection. Regarding you comment you will need to be a bit more clear. For example what do you mean by "s3 variable" what do you mean by "load" you only talked of copying. Finally you ask of an example of how I solve this and is written at the end of the answer: "creating two S3 connections using the first one (the one of the enterprise) for downloading from the original bucket into a temp directory and the second one (from enterprise) to upload to my enterprise bucket" – Lucas M. Uriarte Sep 27 '22 at 08:24