I have below 2 files and 1 directory in HDFS.
-rw-r--r-- 1 hadoop hadoop 11194859 2017-05-05 19:53 hdfs:///outputfiles/abc_output.txt
drwxr-xr-x - hadoop hadoop 0 2017-05-05 19:28 hdfs:///outputfiles/sample_directory
-rw-r--r-- 1 hadoop hadoop 68507436 2017-05-05 19:55 hdfs:///outputfiles/sample_output.txt
I want to copy abc_output.txt and sample_directory in gzip format onto S3 from HDFS in a single command. I don't want the files to be combined on S3.
My S3 bucket should contain the following: abc_output.txt.gzip sample_directory.gzip
I tried the following:
s3-dist-cp --s3Endpoint=s3.amazonaws.com --src=hdfs:///outputfiles/ --dest=s3://bucket-name/outputfiles/ --outputCodec=gzip
But this copies all files and folders from source to destination.
By referring Deduce the HDFS path at runtime on EMR , I also tried the below command:
s3-dist-cp --s3Endpoint=s3.amazonaws.com --src=hdfs:///outputfiles/ --dest=s3://bucket-name/outputfiles/ --srcPattern=.*abc_output.txt.sample_directory. --outputCodec=gzip but this failed.