Form array of strings where each string is of length 5 words?

Question

Write a MapReduce program in Hadoop that counts the number of times every unique 5-word sequence occurs in the Sample.txt file provided. The final output of your program should list in separate lines the 5-word sequences and their counts.

Example:
Sam is a good boy and he always stands in top five rankings in his school.

This has to be processed as:

Sam is a good boy : 1
is a good boy and : 1
a good boy and he : 1
good boy and he always : 1
boy and he always stands : 1

. . . similarly, it goes on If it finds a repeating string of 5-word sequence it must be shown as 2 times

MY CODE:

public void map(LongWritable key, Text value, Context context ) throws  IOException, InterruptedException{
  StringBuilder sb = new StringBuilder();       
  StringTokenizer itr = new StringTokenizer(value.toString());      
  String[] tokens = new String[itr.countTokens() * 5]       
   for(int l = 0 ; l<tokens.length;l++){
         tokens[l] = itr.nextToken();
    }

   for(int i = 0; i < tokens.length; i++){
     sb.append(tokens[i]);
        for(int j = i+1;j<i+5;j++){
           sb.append(" ");
           sb.append(tokens[j]);
    }
    word.set(sb.toString());
    context.write(word, one);
    System.out.println(sb.toString());
    sb.setLength(0);
}

Are you hoping someone here is going to do your full homework for you from scratch? — achAmháin, Oct 15 '17 at 16:39
If that's your code, can you edit original post and add to it? — achAmháin, Oct 15 '17 at 16:46
this is the mapper code which i have tried but i dont know is there any easy way to implement this using bigdata — Dhanush M, Oct 15 '17 at 16:49
What you're looking for is called N grams. And if you want to have something like this, use Stanford NLP library to process the string — OneCricketeer, Oct 15 '17 at 18:44
For example, there's code here that'll help you generate what you need. https://stackoverflow.com/questions/3656762/n-gram-generation-from-a-sentence — OneCricketeer, Oct 15 '17 at 19:16

Form array of strings where each string is of length 5 words?

0 Answers0