I've splitted big binary file to (2Gb) chunks and uploaded it to Amazon S3. Now I want to join it back to one file and process with my custom
I've tried to run
elastic-mapreduce -j $JOBID -ssh \
"hadoop dfs -cat s3n://bucket/dir/in/* > s3n://bucket/dir/outfile"
but it failed due to -cat output data to my local terminal - it does not work remotely...
How I can do this?
P.S. I've tried to run cat as a streaming MR job:
den@aws:~$ elastic-mapreduce --create --stream --input s3n://bucket/dir/in \
--output s3n://bucket/dir/out --mapper /bin/cat --reducer NONE
this job was finished successfully. But. I had 3 file parts in dir/in - now I have 6 parts in /dir/out
part-0000
part-0001
part-0002
part-0003
part-0004
part-0005
And file _SUCCESS ofcource which is not part of my output...
So. How to join splitted before file?