How to get rid of the suffix "-r-00xxx" when using Hadoop MultipleOutputs?

Question

In the MR job

FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
MultipleOutputs.addNamedOutput(job, OUTPUT, TextOutputFormat.class, NullWritable.class, Text.class);

In my Reducer

String myKey = "key"+i;
mos.write(OUTPUT, NullWritable.get(), new Text(lin), myKey);

The actual output files:

key10-r-00001.gz
key10-r-00002.gz
key11-r-00000.gz
key11-r-00006.gz
key19-r-00000.gz

But what I'm expecting is as follows:

key10.gz
key11.gz
key19.gz

Do I need to use shell script to rename and merge the actual output files? Or are there any other solutions I can try in MR Job without any extra steps? Thank you!

Can I create a custom MyOutputFormat by `extends FileOutputFormat`? — frankilee, Nov 18 '15 at 00:39
Looks like you need to add **LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);** too along with what you have done. You can look [here](https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html) for more info — Ramzy, Nov 18 '15 at 01:13
Thank you Ramzy, but LazyOutputFormat is just getting rid of the empty files in the output dir. — frankilee, Nov 18 '15 at 07:49
I assumed that ,It is for delaying the file creation until the contents are present. It should be used along with the code which you were already trying. Its mentioned as per the documentation link provided. — Ramzy, Nov 18 '15 at 13:46
Oh, got it. Yes, you are right, LazyOutputFormat should be used along with the code. I finally solved the problem by using `extends TextOutputFormat` and `extends Partitioner`, but I'm not sure if there are other better solutions. Happy learning if other solutions are proposed. — frankilee, Nov 18 '15 at 17:54
Nice to know that. The portion of file name comes from the partitioner. So looks like your approach is good. Can you post your approach with more details, and accept your own answer — Ramzy, Nov 18 '15 at 18:12

How to get rid of the suffix "-r-00xxx" when using Hadoop MultipleOutputs?

0 Answers0