1

I have a problem with the following problem:

Problem:

Implement a function count_words() in Python that takes as input a string s and a number n, and returns the n most frequently-occuring words in s. The return value should be a list of tuples - the top n words paired with their respective counts [(, ), (, ), ...], sorted in descending count order.

You can assume that all input will be in lowercase and that there will be no punctuations or other characters (only letters and single separating spaces). In case of a tie (equal count), order the tied words alphabetically.

E.g.:

print count_words("betty bought a bit of butter but the butter was bitter",3) Output:

[('butter', 2), ('a', 1), ('betty', 1)]

This is my solution:

    """Count words."""

from operator import itemgetter
from collections import Counter

def count_words(s, n):
    """Return the n most frequently occuring words in s."""

    # TODO: Count the number of occurences of each word in s
    words = s.split(" ");
    words = Counter(words)
    # TODO: Sort the occurences in descending order (alphabetically in case of ties)
    print(words)
    # TODO: Return the top n words as a list of tuples (<word>, <count>)
    top_n = words.most_common(n)
    return top_n

def test_run()

    """Test count_words() with some inputs."""
    print(count_words("cat bat mat cat bat cat", 3))
    print(count_words("betty bought a bit of butter but the butter was bitter", 3))


if __name__ == '__main__':
    test_run()

The problem is that elements with equal counts are ordered arbitrarily, how can i order that elements by alphabetical order ??

Gabriele Picco
  • 457
  • 4
  • 22

3 Answers3

4

You can sort them using the number of occurrence (in reverse order) and then the lexicographical order:

>>> lst = [('meat', 2), ('butter', 2), ('a', 1), ('betty', 1)]
>>> 
>>> sorted(lst, key=lambda x: (-x[1], x[0]))
#                              ^ reverse order 
[('butter', 2), ('meat', 2), ('a', 1), ('betty', 1)]

The number of occurrence takes precedence over the lex. order.

In your case, use words.items() in place of the list of the list I have used. You will no longer need to use most_common as sorted already does the same.

Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
0

The python function sorted is stable, which means in case of a tie, the tied items will be in the same order. Because of this, you can sort first on the strings to get them in order:

alphabetical_sort = sorted(words.items(), key=lambda x: x[0])

and then on the counts:

final_sort = sorted(alphabetical_sort, key=lambda x: x[1], reverse=True)

Edit: Didn't see Moses' better answer. Of course, the less sorts the better.

Community
  • 1
  • 1
SCB
  • 5,821
  • 1
  • 34
  • 43
0

This is another way of conceptualizing the problem:

def count_words(s, n):

words = s.split(" ")
# TODO: Count the number of occurences of each word in s
counters = {}
for word in words:
    if word in counters:
        counters[word] += 1
    else:
        counters[word] = 1
# TODO: Sort the occurences in descending order (alphabetically in case of ties)
top = sorted(counters.iteritems(), key=lambda d:(-d[1],d[0]))

# TODO: Return the top n words as a list of tuples (<word>, <count>)
top_n = top[:n]
return top_n

def test_run():

print count_words("cat bat mat cat bat cat", 3)
print count_words("betty bought a bit of butter but the butter was bitter", 3)

if name == 'main': test_run()

Kambiz Behi
  • 70
  • 10