0

My question has probably already been asked but I can not find a clear answer to my question.

My MapReduce is a basic WordCount. My current output file is :

// filename : 'part-r-00000'
789  a
755  #c   
456  d
123  #b

How can I change the ouput filename ?

Then, is-it possible to have 2 output files :

// First output file
789  a
456  d

// Second output file
123  #b
755  #c

Here's my reduce class :

public static class SortReducer extends Reducer<IntWritable, Text, IntWritable, Text> {

    public void reduce(IntWritable key, Text value, Context context) throws IOException, InterruptedException {

        context.write(key, value);

    }
}

Here's my Partitionner Class :

public class TweetPartitionner extends Partitioner<Text, IntWritable>{

    @Override
    public int getPartition(Text a_key, IntWritable a_value, int a_nbPartitions) {
        if(a_key.toString().startsWith("#"))
            return 1;
        return 0;
    }


}

Thanks a lot !

Apaachee
  • 900
  • 2
  • 10
  • 32

2 Answers2

1

To your other question on how to change the output file name , you can have a look at http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#write(java.lang.String, K, V).

Magham Ravi
  • 603
  • 4
  • 8
0

In your job file set

job.setNumReduceTasks(2);

From mapper emit

a    789
#c   755     
d    456  
#b   123 

write a partitioner, add partitioner to job config, In partitioner check if key starts with # return 1 else 0

in reducer swap key and value

banjara
  • 3,800
  • 3
  • 38
  • 61
  • Thanks a lots zuxqoj, it seems to be a good solution. So I updated my post with my Partitionner. But when i run the program, i have an error : `java.io.IOException: Illegal partition for #rescinfo (1)` why ? – Apaachee Jun 25 '13 at 12:41
  • I foud a beginning of solution : http://stackoverflow.com/questions/12928101/hadoop-number-of-reducer-is-not-equal-to-what-i-have-set-in-program – Apaachee Jun 25 '13 at 13:03
  • Eclipse can only launch ONE reducer. My Hadoop installation is on cygwin on my machine. How can i do others reducers with my installation ? – Apaachee Jun 25 '13 at 13:04
  • bundle your code in jar and run directly on hadoop, Usage: hadoop jar [mainClass] args... – banjara Jun 25 '13 at 14:06
  • I can't cause my application is a servlet/JSP ! – Apaachee Jun 25 '13 at 14:24