Top K frequent words stuck in one part

Question

This refers to leetcode problem: https://leetcode.com/problems/top-k-frequent-words/ Here is my code:

import heapq

class Solution:
# def topKFrequent(self, words: List[str], k: int) -> List[str]:
def topKFrequent(self, words, k):
    results = []
    wordTable = {}
    for word in words:
        if (wordTable.get(word) is None):
            wordTable[word] = 1
            continue
        wordTable[word] = (wordTable.get(word)) + 1

    heap = []
    # print(wordTable)
    heapSize = 0

    for word in wordTable.keys():
        node = [wordTable[word], word]
        if(heapSize<k):
            heapq.heappush(heap,node)
            heapSize += 1
            continue
        if(heapSize>=k):
            if (heap[0][0]< node[0]):
                heapq.heappushpop(heap,node)
                heapSize -= 1
                continue
            if heap[0][0] == node[0] and heap[0][1]>node[1]:
                heapq.heappop(heap)
                heapq.heappush(heap,node)
                heapSize -= 1
                continue

    # heap.sort(key = lambda x: x.freq, reverse=True);
    print(heap)

    for i in reversed(range(k)):
        results.append(heap[i][1])
    return results

The code works if all the words have different frequency, since it uses a min-heap. However, it doesn't work if they have the same frequency since it is going in reverse order, so the word that is alphabetically greater comes first which is not accepted (for example, if I have 4 words with same frequency and lets say they are a,b,c,d: my result will be d,c,b,a which is not acceptable) I am not sure how to account for this case and am stuck for 3 hours on this issue. Can anyone please help?

Jacob Brazeal · Answer 1 · 2020-06-03T13:27:15.583

0

Use functools.cmp_to_key.

from functools import cmp_to_key

def cmp(a, b):
    if a[0] == b[0]:
         return -1 if a[1] < b[1] else 1
    return -1 if a[0] > b[0] else 1
return sorted(heap, key=cmp_to_key(cmp))

edited Jun 03 '20 at 13:27

answered Jun 03 '20 at 13:18

Jacob Brazeal

654
9
16

Abhishek J · Answer 2 · 2020-06-03T13:37:10.770

This one liner should help you.

from collections import Counter

topk = lambda words, k: [t[0] for t in Counter(list(sorted(words))).most_common(k)]

print(topk(["i", "love", "leetcode", "i", "love", "coding"], k=2))
print(topk(["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k=4))

# Output
['i', 'love']
['the', 'is', 'sunny', 'day']

The first step is to sort the list beforehand using list(sorted(words)).
Counter converts the list into frequencies. Its in-built like heapq.
most_common(k) as the name suggests gives you the most common words. But note, that we have already sorted them lexicographically.
The final outer for loop finishes it off by just using the first value of the tuples that the most_common(k) function returns

Top K frequent words stuck in one part

2 Answers2