I have been working on a simple wordcount program that given a text input prints out the number of occurrences of each word.
The reduce function looks like:
def reducer(self, word, count):
yield(word, sum(count))
The above reducer() works correctly to count the occurrence of each word in the input text file.
Now, I want to adjust the reduce() function so that only words that have an occurrence of 10 or more are printed out in the output file. I thought, it may look like this:
def reducer(self, word, count):
if sum(count)>10:
emit(word,sum(count))
However this doesn't work. Instead the output file that is produced prints 0 by each word. I'm pretty sure that the reducer() function needs adjusting and not the map function. However, I can't think of anything apart from including an if statement. I would really appreciate some advice.