finding top k largest keys in a dictionary python

Question

Lets say I have a dictionary:

{key1:value1........... keyn:valuen}

So lets say I want to write a function

def return_top_k(dictionary, k):

    return list_of_keys_sorted

What is the most efficient way (in terms of big O) to get the keys which have the top k values (maintaining the order i.e the highest value key is present in the beginning.. and so on.)

When talking about dictionaries using `k` for a count is confusing because it often stands for 'key'. Use `n` instead. — Steven Rumbalski, Sep 04 '12 at 15:34

jfs · Accepted Answer · 2012-09-04T15:41:01.197

31

O(n log k):

import heapq

k_keys_sorted = heapq.nlargest(k, dictionary)

You could use key keyword parameter to specify what should be used as a sorting key e.g.:

k_keys_sorted_by_values = heapq.nlargest(k, dictionary, key=dictionary.get)

edited Sep 04 '12 at 15:41

answered Sep 04 '12 at 15:27

jfs

399,953
195
994
1,670

7

I think the OP is actually asking for something like `heapq.nlargest(k, dictionary, key=dictionary.__getitem__)`: dictionary keys ordered by their values. – Danica Sep 04 '12 at 15:29
1

Note, that this a result for sorting keys – Alexey Kachayev Sep 04 '12 at 15:29
@Dougal: `list_of_keys_sorted` suggests otherwise – jfs Sep 04 '12 at 15:30
Hmm, true. The word "value" is confusing; either one might be right. :) – Danica Sep 04 '12 at 15:30
https://www.safaribooksonline.com/library/view/python-cookbook-3rd/9781449357337/ch01s04.html – Bala May 27 '16 at 08:33

score 7 · Answer 2 · edited Apr 14 '15 at 00:47

7

return sorted(dictionary, key=dictionary.get, reverse=True)[:10]

Should be at worst O(NlogN) (although heapq proposed by others is probably better) ...

It might also make sense to use a Counter instead of a regular dictionary. In that case, the most_common method will do (approximately) what you want (dictionary.most_common(10)), but only if it makes sense to use a Counter in your API.

edited Apr 14 '15 at 00:47

wim

338,267
99
616
750

answered Sep 04 '12 at 15:24

mgilson

300,191
65
633
696

This is perfect. I'll add that if the dictionary happens to a dictionary of counts, `collections.Counter` would be the right data structure. Then the solution would be `[k for k, v in counts.most_common(10)]`. – Steven Rumbalski Sep 04 '12 at 15:30

score 4 · Answer 3 · answered May 27 '16 at 08:33

portfolio = [
   {'name': 'IBM', 'shares': 100, 'price': 91.1},
   {'name': 'AAPL', 'shares': 50, 'price': 543.22},
   {'name': 'FB', 'shares': 200, 'price': 21.09},
   {'name': 'HPQ', 'shares': 35, 'price': 31.75},
   {'name': 'YHOO', 'shares': 45, 'price': 16.35},
   {'name': 'ACME', 'shares': 75, 'price': 115.65}
]

cheap = heapq.nsmallest(3, portfolio, key=lambda s: s['price'])
expensive = heapq.nlargest(3, portfolio, key=lambda s: s['price'])

score 3 · Answer 4 · edited Oct 01 '21 at 07:14

For top-3 step by step:

>>> from operator import itemgetter
>>> dct = {"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
>>> sorted(dct.items(), key=itemgetter(1), reverse=True)
[('e', 5), ('d', 4), ('c', 3), ('b', 2), ('a', 1)]
>>> map(itemgetter(0), sorted(dct.items(), key=itemgetter(1), reverse=True))
['e', 'd', 'c', 'b', 'a']
>>> map(itemgetter(0), sorted(dct.items(), key=itemgetter(1), reverse=True))[:3]
['e', 'd', 'c']

Or using heapq module

>>> import heapq
>>> from operator import itemgetter
>>> heapq.nlargest(3, dct.items(), key=itemgetter(1))
[('e', 5), ('d', 4), ('c', 3)]
>>> map(itemgetter(0), _)
['e', 'd', 'c']

Saksham Varma · Answer 5 · 2015-04-14T01:25:16.620

1

In code

dct = {"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
k = 3
print sorted(dct.keys(), reverse=True)[:k]

If you also need values:

print sorted(dct.items(), reverse=True)[:k]

Or if you would want to use OrderedDict:

from collections import OrderedDict
d = OrderedDict(sorted(dct.items(), reverse=True))
print d.keys()[:k]

edited Apr 14 '15 at 01:25

answered Apr 14 '15 at 01:02

Saksham Varma

2,122
13
15

score 0 · Answer 6 · answered Jan 14 '21 at 15:14

so if you want top K frequent Elements to be printed from the Dictionary; you have to use heapq.nlargest funtcion.

Here is the example for the same:

return heapq.nlargest(k,count.keys(), key = count.get)

Here, k is the number that helps us find out elements which are repeated in a dictionary k times or more than k times.

count.keys() : This gives you the keys or the elements present in the heap which is created using the collections.counter

key = count.get() : This is used to print the Keys of the heap. If we skip this; it will print the Values of the dictionary i.e. the number of times the element is occurring in the dictionary.

finding top k largest keys in a dictionary python

6 Answers6

Linked

Related