3

I need to copy large number of small files from one S3 bucket to another. I'm using S3-Dist-Cp command provided by AWS.

s3-dist-cp --src=s3://some-bucket/ --dest=s3://another-bucket/ --groupBy=<some-pattern> --targetSize=<size> --deleteOnSuccess

Now, the problem with this command is that it takes forever to copy all small files and merge them.

Note - Source bucket is being written continuously with new files by some other job and I think s3-dist-cp never catches with last file.

Is there any workaround for this solution? destination bucket will be used by Spark job to process these files.

hlagvankar
  • 219
  • 1
  • 3
  • 12

0 Answers0