0

I'm trying to copy data from 1 s3 object storage to another object storage (both on prem) using hadoop cli.

Both storage have different endpoint, access keys and secret keys.

hdfs dfs -Dfs.s3a.endpoint=xxxx:xxxx -Dfs.s3a.access.key=xxxxx -Dfs.s3a.secret.key=xxxx -ls s3a://bucket-name/

this works for both storage.

But i'm not able to copy from 1 to another as i have no clue how to enter multiple values for access keys/secret keys/endpoints in a single command.

I can do this using a java code, but want to do it through commandline.

Thanks.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • S3 cli itself should have tools to copy between buckets/regions/accounts. Why do you want to use hadoop cli? – OneCricketeer May 20 '22 at 05:19
  • We do not have s3 cli installed on our cluster, as from the beginnning we have using hdfs utils, have intergrated s3 creds for 1 object storage in core-site.xml. – Pravin rathore May 20 '22 at 05:37
  • What is the "other object store"? Is it S3? According to this answer, two account keys aren't possible, and you need a temporary destination (and create smaller data batches) https://stackoverflow.com/questions/51153521/copying-s3-files-across-aws-account-using-s3-dist-cp#51162095 – OneCricketeer May 20 '22 at 05:45
  • And if you're using Minio, as "on prem S3" that cli is easy to install, but it too has replication solutions available, so you shouldn't have to do batch copies – OneCricketeer May 20 '22 at 05:47
  • @OneCricketeer Thanks for the alternatives mate. Although i did find a solution. You can set per-bucket level details in the command. hdfs dfs -Dfs.s3a.multipart.purge=false -Dfs.s3a.bucket.bucket_name_1.endpoint=xxxx -Dfs.s3a.bucket.bucket_name_1.access.key=xxxx -Dfs.s3a.bucket.bucket_name_1.secret.key=xxxx -Dfs.s3a.bucket.bucket_name_2.endpoint=xxxx -Dfs.s3a.bucket.bucket_name_2.access.key=xxxx -Dfs.s3a.bucket.bucket_name_2.secret.key=xxxx -ls s3a://bucket_name_1/ s3a://bucket_name_2/ endpoint can have :port based on network configurations. Thanks – Pravin rathore May 20 '22 at 08:23

1 Answers1

0

You can set per-bucket level details in the command.

hdfs dfs \
-Dfs.s3a.multipart.purge=false \
-Dfs.s3a.bucket.bucket_name_1.endpoint=xxxx \
-Dfs.s3a.bucket.bucket_name_1.access.key=xxxx \
-Dfs.s3a.bucket.bucket_name_1.secret.key=xxxx \
-Dfs.s3a.bucket.bucket_name_2.endpoint=xxxx \
-Dfs.s3a.bucket.bucket_name_2.access.key=xxxx \
-Dfs.s3a.bucket.bucket_name_2.secret.key=xxxx \
-ls s3a://bucket_name_1/ s3a://bucket_name_2/

endpoint can have :port based on network configurations.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245