-1

for the problem on leetcode 'Top K Frequent Elements' https://leetcode.com/problems/top-k-frequent-elements/submissions/

there is a solution that completes the task in just 88 ms, mine completes the tasks in 124 ms, I see it as a large difference.

I tried to understand why buy docs don't provide the way the function I use is implemented which is most_common(), if I want to dig a lot in details like that, such that I can write algorithms that run so fast in the future what should I read(specific books? or any other resource?).

my code (124 ms)

def topKFrequent(self, nums, k):
        if k ==len(nums):
            return nums
        c=Counter(nums)
        return [ t[0] for t in c.most_common(k) ]

other (88 ms) (better in time)

def topKFrequent(self, nums, k):
        if k == len(nums):
            return nums
        count = Counter(nums)   
        return heapq.nlargest(k, count.keys(), key=count.get) 

both are nearly taking same amount of memory, so no difference here.

Ahmed Adel
  • 39
  • 1
  • 5
  • 1
    Well since Python is open source, so you can just look at how `nlargest` is implemented. I did the legwork for you though and found it, does it help? https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Lib/heapq.py#L521 – Random Davis Oct 01 '21 at 21:32
  • The difference is probably just due to the way the result list is constructed. `heapq.nlargest` probably allocates a list of `n` elements all at once, while your list comprehension grows it incrementally. – Barmar Oct 01 '21 at 21:35
  • Why do you want it to return `nums` when `k == len(nums)`? Surely, the highest k can be is the length of the list of all unique values. I.e. `set(nums)`. – Bill Oct 01 '21 at 21:58
  • In my tests with k=500, there is no noticeable difference between the execution time of `count = Counter(nums)` and both of these functions. I.e. >99% of execution time is used by `Counter(nums)`. What values are you using for `nums` and `k`? – Bill Oct 01 '21 at 22:08
  • 1
    How often did you submit each, and what were the individual times? – no comment Oct 01 '21 at 23:05
  • @don'ttalkjustcode is that question for me or the OP? For my tests, see my numpy answer below. OP did not respond to my request for example values for nums and k. – Bill Oct 01 '21 at 23:12
  • @Bill The OP, otherwise I would've @'ed you. Just want to make sure they didn't just submit each only once and are acting as if LeetCode's timing were stable rather than pretty random. – no comment Oct 01 '21 at 23:16

2 Answers2

4

The implementation of most_common also uses heapq.nlargest, but it calls it with count.items() instead of count.keys(). This will make it a tiny bit slower, and also requires the overhead of creating a new list, in order to extract the [0] value from each element in the list returned by most_common().

The heapq.nlargest version just avoids this extra overhead, and passes count.keys() as second argument, and therefore it doesn't need to iterate that result again to extract pieces into a new list.

trincot
  • 317,000
  • 35
  • 244
  • 286
0

@trincot seems to have answered the question but if anyone is looking for a faster way to do this then use Numpy, provided nums can be stored as a np.array:

def topKFrequent_numpy(nums, k):
    unique, counts = np.unique(nums, return_counts=True)
    return unique[np.argsort(-counts)[:k]]

One speed test

nums_array = np.random.randint(1000, size=1000000)
nums_list = list(nums_array)

%timeit topKFrequent_Counter(nums_list, 500)
# 116 ms ± 4.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit topKFrequent_heapq(nums_list, 500)
# 117 ms ± 3.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit topKFrequent_numpy(nums_array, 500)
# 39.2 ms ± 185 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

(Speeds may be dramatically different for other input values)

Bill
  • 10,323
  • 10
  • 62
  • 85