To count the words of a text file chapterwise in hadoop

Question

I have successfully performed the word count in hadoop. Now i want to repeat the same process with a text file or pdf. I want to count the words chapter wise. What should I do?

score 0 · Answer 1 · edited May 23 '17 at 12:25

In MapReduce it's all about how you construct your key's.

In wordcount every word in map phase is counted as 1, In reducer you get the aggregate of word appeared in entire file processed.

Wordcount example:

Map Phase:
<Key , val>
in, 1
at, 1
in, 1

Reducer Phase:
in, 2
at, 1

For dividing into one more level(chapters), You just need to construct composite key.

Wordcount w chapter example:

Map Phase:
<Key , val>
chapter1-in, 1
chapter1-at, 1
chapter2-in, 1

Reducer Phase:
chapter1-in, 1
chapter1-at, 1
chapter2-in, 1

Secondary sort is better and cleaner way for implement the same but with added complexity. hadoop map reduce secondary sorting

sorry but i am not able to understand your answer. my teacher told me that each chapter should be stored in a different nodes and than to perform mapper code for each node and after that reducer code — AYUSHI GUPTA, Mar 27 '17 at 15:59

To count the words of a text file chapterwise in hadoop

1 Answers1