0

Write a MapReduce program in Hadoop that counts the number of times every unique 5-word sequence occurs in the Sample.txt file provided. The final output of your program should list in separate lines the 5-word sequences and their counts.

Example:
Sam is a good boy and he always stands in top five rankings in his school.

This has to be processed as:

  1. Sam is a good boy : 1
  2. is a good boy and : 1
  3. a good boy and he : 1
  4. good boy and he always : 1
  5. boy and he always stands : 1

. . . similarly, it goes on If it finds a repeating string of 5-word sequence it must be shown as 2 times

MY CODE:

public void map(LongWritable key, Text value, Context context ) throws  IOException, InterruptedException{
  StringBuilder sb = new StringBuilder();       
  StringTokenizer itr = new StringTokenizer(value.toString());      
  String[] tokens = new String[itr.countTokens() * 5]       
   for(int l = 0 ; l<tokens.length;l++){
         tokens[l] = itr.nextToken();
    }

   for(int i = 0; i < tokens.length; i++){
     sb.append(tokens[i]);
        for(int j = i+1;j<i+5;j++){
           sb.append(" ");
           sb.append(tokens[j]);
    }
    word.set(sb.toString());
    context.write(word, one);
    System.out.println(sb.toString());
    sb.setLength(0);
}
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Dhanush M
  • 1
  • 2

0 Answers0