Returning the three maximal values in a dictionary

Question

I have the following dictionary:

'{0: 0, 1: 11, 2: 26, 3: 43, 4: 14, 5: 29, 6: 34, 7: 49, 8: 49, 9: 108, 10: 124, 11: 108, 12: 361, 13: 290, 14: 2118, 15: 5408, 16: 43473, 17: 109462, 18: 111490, 19: 244675, 20: 115878, 21: 6960}'

And for this dictionary I want write a function that returns the three key-value pairs that have the highest values (So in this case key 18, 19, 20).

I came up with the following:

cachedict = nr_of_objects_per_century() #Dictionary mentioned above

def top_3_centuries():
        max_nr_works_list = sorted(cachedict.values())
        top_3_values = []
        for i in range(len(max_nr_works_list)-3, len(max_nr_works_list)):
            top_3_values.append(max_nr_works_list[i])
            print(top_3_values)

This gives me a list of the max-values I want to lookup. But how do I proceed from here? Is there a way to do this without a reverse-lookup (Which is slow for dictionaries, right?) I have the feeling that I can do this task much more efficiently/pythonic.

score 6 · Accepted Answer · edited Nov 20 '18 at 10:23

6

You could also use collections.Counter with most_common (which internally uses a heap queue):

from collections import Counter

dct = {0: 0, 1: 11, 2: 26, 3: 43, 4: 14, 5: 29, 6: 34, 7: 49, 8: 49, 
       9: 108, 10: 124, 11: 108, 12: 361, 13: 290, 14: 2118, 15: 5408, 
       16: 43473, 17: 109462, 18: 111490, 19: 244675, 20: 115878, 21: 6960}

count = Counter(dct)
print(count.most_common(3))  # [(19, 244675), (20, 115878), (18, 111490)]

edited Nov 20 '18 at 10:23

jpp

159,742
34
281
339

answered Nov 20 '18 at 10:10

hiro protagonist

44,693
14
86
111

1

@schwobaseggl actually it works nicely; as the examples show: a `Counter` can be initialized with a dict: `Counter({'red': 4, 'blue': 2})`. the values will be used as number of occurrences. – hiro protagonist Nov 20 '18 at 10:14
3

This is totally the way to go IMO, under the hood this uses `heapq.nlargest` anyway https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Lib/collections/__init__.py#L584 – Chris_Rands Nov 20 '18 at 10:20
1

@hiroprotagonist Thanks =) This is a nice concise way to do it. Do you by any chance know if this avoids a full sort also? Or is the dict first sorted if you use the most_common function? – Psychotechnopath Nov 20 '18 at 10:21
1

@hiroprotagonist My bad, totally solid!! +1 – user2390182 Nov 20 '18 at 10:25

jpp · Answer 2 · 2019-01-11T09:46:13.640

6

`heapq.nlargest`

You can avoid a full sort here by using a heap queue:

from heapq import nlargest
from operator import itemgetter

dct = {0: 0, 1: 11, 2: 26, 3: 43, 4: 14, 5: 29, 6: 34, 7: 49, 8: 49,
       9: 108, 10: 124, 11: 108, 12: 361, 13: 290, 14: 2118, 15: 5408,
       16: 43473, 17: 109462, 18: 111490, 19: 244675, 20: 115878, 21: 6960}

res = nlargest(3, dct.items(), key=itemgetter(1))

print(res)
# [(19, 244675), (20, 115878), (18, 111490)]

edited Jan 11 '19 at 09:46

answered Nov 20 '18 at 10:14

jpp

159,742
34
281
339

So if I understand correctly this code heapifies my dictionary to select the 3 largest elements? How does the key-lambda expression work here? – Psychotechnopath Nov 20 '18 at 10:20
1

Correct. `dct.items` returns key-value pairs, `lambda x: x[1]` extracts the second of these, i.e. the value (since Python is zero-indexed), i.e. the value. `nlargest` means "get the largest" and we have the `3` argument for how many. – jpp Nov 20 '18 at 10:21
2

This is a nice way to avoid sorting. +1. I think sorting is not needed if you only want the top 3 elements. – RoadRunner Nov 20 '18 at 10:27
Hello guys, I am learning data structures now and I want to see how this knowledge applies to python. If I am correct, nlargest first *heapifies* my dictionary values and then extracts the 3 largest maximums right? – Psychotechnopath Jan 11 '19 at 09:38
1

@Psychotechnopath, No. There's no thing as "heapify". It iterates the dictionary and keeps only the largest 3 valuese. So a full sort isn't required. – jpp Jan 11 '19 at 09:39
@jpp ty for fast answer =) Not sure if I fully understand: why is nlargest part of heapq algorithm if the function itself doesn't use heaps? And how can nlargest "know" the three max values by simply iterating through a dict? – Psychotechnopath Jan 11 '19 at 09:42
1

@Psychotechnopath, It holds 3 values in memory and updates as it iterates. If you need more information, see [Priority queue](https://en.wikipedia.org/wiki/Priority_queue). You can write a book on this. If you have a *very specific* question on how it's implemented, I suggest you ask a new question. – jpp Jan 11 '19 at 09:45

score 3 · Answer 3 · answered Nov 20 '18 at 10:14

You can use this:

a = {0: 0, 1: 11, 2: 26, 3: 43, 4: 14, 5: 29, 6: 34, 7: 49, 8: 49,
       9: 108, 10: 124, 11: 108, 12: 361, 13: 290, 14: 2118, 15: 5408,
       16: 43473, 17: 109462, 18: 111490, 19: 244675, 20: 115878, 21: 6960}

l = sorted(list(a.items()), key=lambda tup: tup[1], reverse=True)[:3]
print(l) # [(19, 244675), (20, 115878), (18, 111490)]

It converts the dictionary a into a list of tuples, sort by tup[1], reverse it and get the first 3 hits.

score 2 · Answer 4 · answered Nov 20 '18 at 10:10

You can do it like so:

dct = {0: 0, 1: 11, 2: 26, 3: 43, 4: 14, 5: 29, 6: 34, 7: 49, 8: 49, 9: 108, 10: 124, 11: 108, 12: 361, 13: 290, 14: 2118, 15: 5408, 16: 43473, 17: 109462, 18: 111490, 19: 244675, 20: 115878, 21: 6960}

res = [next(k for k in dct if dct[k]==v) for v in sorted(dct.values(), reverse=True)[:3]]
print(res)  # -> [19, 20, 18]

Break-down:

sorted(dct.values(), reverse=True)[:3]:: Takes the 3 max dictionary values.
next(k for k in dct if dct[k]==v):: returns the dictionary key, for which the value is one of the above 3 (iteratively).

B. M. · Answer 5 · 2018-11-20T11:35:23.043

2

in two simple steps :

aux = sorted([(v,k) for (k,v) in dic.items()])
res = [(v,k) for (k,v) in aux[-3:]] 
#[(18, 111490), (20, 115878), (19, 244675)]

faster than nlargest and Counter.most_common on this example.

edited Nov 20 '18 at 11:35

answered Nov 20 '18 at 10:16

B. M.

18,243
2
35
54

2

True, but I don't think that's the case in general (As the dicts get larger) right? – Psychotechnopath Nov 20 '18 at 10:40

New2coding · Answer 6 · 2018-11-20T16:21:35.453

2

This returns what you want:

d = {0: 0, 1: 11, 2: 26, 3: 43, 4: 14, 5: 29, 6: 34, 7: 49, 8: 49, 9: 108, 10: 124, 11: 108, 12: 361, 13: 290, 14: 2118, 15: 5408, 16: 43473, 17: 109462, 18: 111490, 19: 244675, 20: 115878, 21: 6960}

print(sorted([(i,j) for i, j in d.items() if j in (sorted(d.values())[-3:])])[-3:])
#[(18, 111490), (19, 244675), (20, 115878)]

edited Nov 20 '18 at 16:21

answered Nov 20 '18 at 10:19

New2coding

715
11
23

score 1 · Answer 7 · answered Nov 20 '18 at 10:09

d = {0: 0, 1: 11, 2: 26, 3: 43, 4: 14, 5: 29, 6: 34, 7: 49, 8: 49, 9: 108, 10: 124, 11: 108, 12: 361, 13: 290, 14: 2118, 15: 5408, 16: 43473, 17: 109462, 18: 111490, 19: 244675, 20: 115878, 21: 6960}

d_items_sorted = sorted(d.items(), key=lambda x: x[1], reverse=True)

d_items_sorted[:3]

Returns :

[(19, 244675), (20, 115878), (18, 111490)]

This is the easiest code I could get, but sorting the dictionary cost O(nlogn) and you should be able to do the same in O(n)

score 0 · Answer 8 · answered Nov 20 '18 at 10:17

Are you looking for the most efficient way or just the optimal way in permormace/algorithm simplicity?

If it's the latter may be you should consider sorting dictionary items as tuples (you can get them with cachedict.items()) like in this answer https://stackoverflow.com/a/613218/10453363

Just sort tuples by the value and then get the last 3 tuples (which are key/value pairs)

Returning the three maximal values in a dictionary

8 Answers8

heapq.nlargest

`heapq.nlargest`