I am new to Hadoop and MapReduce and have been trying to write output to multiple files based on keys. Could anyone please provide clear idea or Java code snippet example on how to use it. My mapper is working exactly fine and after shuffle, keys and the corresponding values are obtained as expected. Thanks!
What i am trying to do is output only few records from the input file to a new file.
Thus the new output file shall contain only those required records, ignoring rest irrelevant records.
This would work fine even if i don't use MultipleTextOutputFormat.
Logic which i implemented in mapper is as follows:
public static class MapClass extends
Mapper {
StringBuilder emitValue = null;
StringBuilder emitKey = null;
Text kword = new Text();
Text vword = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] parts;
String line = value.toString();
parts = line.split(" ");
kword.set(parts[4].toString());
vword.set(line.toString());
context.write(kword, vword);
}
}
Input to reduce is like this:
[key1]--> [value1, value2, ...]
[key2]--> [value1, value2, ...]
[key3]--> [value1, value2, ...] & so on
my interest is in [key2]--> [value1, value2, ...] ignoring other keys and corresponding values. Please help me out with the reducer.