I have successfully performed the word count in hadoop. Now i want to repeat the same process with a text file or pdf. I want to count the words chapter wise. What should I do?
Asked
Active
Viewed 300 times
1 Answers
0
In MapReduce it's all about how you construct your key's.
In wordcount every word in map phase is counted as 1, In reducer you get the aggregate of word appeared in entire file processed.
Wordcount example:
Map Phase:
<Key , val>
in, 1
at, 1
in, 1
Reducer Phase:
in, 2
at, 1
For dividing into one more level(chapters), You just need to construct composite key.
Wordcount w chapter example:
Map Phase:
<Key , val>
chapter1-in, 1
chapter1-at, 1
chapter2-in, 1
Reducer Phase:
chapter1-in, 1
chapter1-at, 1
chapter2-in, 1
Secondary sort is better and cleaner way for implement the same but with added complexity. hadoop map reduce secondary sorting
-
sorry but i am not able to understand your answer. my teacher told me that each chapter should be stored in a different nodes and than to perform mapper code for each node and after that reducer code – AYUSHI GUPTA Mar 27 '17 at 15:59