0

I have been working on a simple wordcount program that given a text input prints out the number of occurrences of each word.

The reduce function looks like:

def reducer(self, word, count):
    yield(word, sum(count))

The above reducer() works correctly to count the occurrence of each word in the input text file.

Now, I want to adjust the reduce() function so that only words that have an occurrence of 10 or more are printed out in the output file. I thought, it may look like this:

def reducer(self, word, count):
   if sum(count)>10:
        emit(word,sum(count))

However this doesn't work. Instead the output file that is produced prints 0 by each word. I'm pretty sure that the reducer() function needs adjusting and not the map function. However, I can't think of anything apart from including an if statement. I would really appreciate some advice.

Simin
  • 15
  • 6

2 Answers2

0

You can try something in the line of this:

def threshold(x, y, n=10):
    return True if y >= n else False

filter(threshold, reducer)
iDrwish
  • 3,085
  • 1
  • 15
  • 24
0

count is an iterable, and you are iterating it twice, the second time it is empty and the sum will be zero.

You need to store the result, then check and output. Otherwise, the logic is correct

def reducer(self, word, count):
   _count = sum(count)
   if _count > 10:
       emit(word, _count)
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Thank you so much. You really helped my understanding. I'm new to python, could you just clarify the importance of _ before the count instead of declaring the variable as count = sum(count). Also, could you explain why the second time it will be empty? Many thanks again – Simin Oct 11 '18 at 12:57
  • Underscore has no meaning. I just don't like reassigning functions parameters. I could've used any other name there. Then, pretend you can only look at a list left to right, and never go backwards. You've already looked at the full list with the first sum, and the second sum is trying to start at the same position in the list that the previous one finished at. https://stackoverflow.com/questions/9884132/what-exactly-are-iterator-iterable-and-iteration – OneCricketeer Oct 11 '18 at 14:31