0

I'm following the word count tutorial here: https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0

and I can produce how often a word appears in this format:

word frequency
1    1
2    2
3    3
4    1
5    2
6    1

However, now I need to group the frequency like this:

frequency   count
1           3
2           2
3           1

Basically, for each frequency, find out how often that appeared. How would I modify the code to show this? I feel like I have to modify IntSumReducer but I've never really worked with Hadoop.

user1883614
  • 905
  • 3
  • 16
  • 30

2 Answers2

1

Instead of modifying SumReducer from example, you should create new job altogether that works off of output of word count program.

Your Mapper will need to output frequency as key and integer 1 as value. You can write your own reducer or just use the same reducer used in example.

alpeshpandya
  • 492
  • 3
  • 12
0

we have to write a mapper function in such a way that it works with the output of the word count program.

map(line):
a=extract 2nd column from the wordcount output
for each frequency in a:
emit<frequency,1>

now reduce in such a way that for same frequency add all of them in a list from the above example: (<1,[1,1,1]> <2,[1,1]> <3,[1]>)

reduce(key, list):
sum=0
for each value in list:
sum+=value
emit<key, sum>
Yu Hao
  • 119,891
  • 44
  • 235
  • 294