6

I have the following dictionary:

'{0: 0, 1: 11, 2: 26, 3: 43, 4: 14, 5: 29, 6: 34, 7: 49, 8: 49, 9: 108, 10: 124, 11: 108, 12: 361, 13: 290, 14: 2118, 15: 5408, 16: 43473, 17: 109462, 18: 111490, 19: 244675, 20: 115878, 21: 6960}'

And for this dictionary I want write a function that returns the three key-value pairs that have the highest values (So in this case key 18, 19, 20).

I came up with the following:

cachedict = nr_of_objects_per_century() #Dictionary mentioned above

def top_3_centuries():
        max_nr_works_list = sorted(cachedict.values())
        top_3_values = []
        for i in range(len(max_nr_works_list)-3, len(max_nr_works_list)):
            top_3_values.append(max_nr_works_list[i])
            print(top_3_values)

This gives me a list of the max-values I want to lookup. But how do I proceed from here? Is there a way to do this without a reverse-lookup (Which is slow for dictionaries, right?) I have the feeling that I can do this task much more efficiently/pythonic.

Psychotechnopath
  • 2,471
  • 5
  • 26
  • 47

8 Answers8

6

You could also use collections.Counter with most_common (which internally uses a heap queue):

from collections import Counter

dct = {0: 0, 1: 11, 2: 26, 3: 43, 4: 14, 5: 29, 6: 34, 7: 49, 8: 49, 
       9: 108, 10: 124, 11: 108, 12: 361, 13: 290, 14: 2118, 15: 5408, 
       16: 43473, 17: 109462, 18: 111490, 19: 244675, 20: 115878, 21: 6960}

count = Counter(dct)
print(count.most_common(3))  # [(19, 244675), (20, 115878), (18, 111490)]
jpp
  • 159,742
  • 34
  • 281
  • 339
hiro protagonist
  • 44,693
  • 14
  • 86
  • 111
  • 1
    @schwobaseggl actually it works nicely; as the examples show: a `Counter` can be initialized with a dict: `Counter({'red': 4, 'blue': 2})`. the values will be used as number of occurrences. – hiro protagonist Nov 20 '18 at 10:14
  • 3
    This is totally the way to go IMO, under the hood this uses `heapq.nlargest` anyway https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Lib/collections/__init__.py#L584 – Chris_Rands Nov 20 '18 at 10:20
  • 1
    @hiroprotagonist Thanks =) This is a nice concise way to do it. Do you by any chance know if this avoids a full sort also? Or is the dict first sorted if you use the most_common function? – Psychotechnopath Nov 20 '18 at 10:21
  • 1
    @hiroprotagonist My bad, totally solid!! +1 – user2390182 Nov 20 '18 at 10:25
6

heapq.nlargest

You can avoid a full sort here by using a heap queue:

from heapq import nlargest
from operator import itemgetter

dct = {0: 0, 1: 11, 2: 26, 3: 43, 4: 14, 5: 29, 6: 34, 7: 49, 8: 49,
       9: 108, 10: 124, 11: 108, 12: 361, 13: 290, 14: 2118, 15: 5408,
       16: 43473, 17: 109462, 18: 111490, 19: 244675, 20: 115878, 21: 6960}

res = nlargest(3, dct.items(), key=itemgetter(1))

print(res)
# [(19, 244675), (20, 115878), (18, 111490)]
jpp
  • 159,742
  • 34
  • 281
  • 339
  • So if I understand correctly this code heapifies my dictionary to select the 3 largest elements? How does the key-lambda expression work here? – Psychotechnopath Nov 20 '18 at 10:20
  • 1
    Correct. `dct.items` returns key-value pairs, `lambda x: x[1]` extracts the second of these, i.e. the value (since Python is zero-indexed), i.e. the value. `nlargest` means "get the largest" and we have the `3` argument for how many. – jpp Nov 20 '18 at 10:21
  • 2
    This is a nice way to avoid sorting. +1. I think sorting is not needed if you only want the top 3 elements. – RoadRunner Nov 20 '18 at 10:27
  • Hello guys, I am learning data structures now and I want to see how this knowledge applies to python. If I am correct, nlargest first *heapifies* my dictionary values and then extracts the 3 largest maximums right? – Psychotechnopath Jan 11 '19 at 09:38
  • 1
    @Psychotechnopath, No. There's no thing as "heapify". It iterates the dictionary and keeps only the largest 3 valuese. So a full sort isn't required. – jpp Jan 11 '19 at 09:39
  • @jpp ty for fast answer =) Not sure if I fully understand: why is nlargest part of heapq algorithm if the function itself doesn't use heaps? And how can nlargest "know" the three max values by simply iterating through a dict? – Psychotechnopath Jan 11 '19 at 09:42
  • 1
    @Psychotechnopath, It holds 3 values in memory and updates as it iterates. If you need more information, see [Priority queue](https://en.wikipedia.org/wiki/Priority_queue). You can write a book on this. If you have a *very specific* question on how it's implemented, I suggest you ask a new question. – jpp Jan 11 '19 at 09:45
3

You can use this:

a = {0: 0, 1: 11, 2: 26, 3: 43, 4: 14, 5: 29, 6: 34, 7: 49, 8: 49,
       9: 108, 10: 124, 11: 108, 12: 361, 13: 290, 14: 2118, 15: 5408,
       16: 43473, 17: 109462, 18: 111490, 19: 244675, 20: 115878, 21: 6960}

l = sorted(list(a.items()), key=lambda tup: tup[1], reverse=True)[:3]
print(l) # [(19, 244675), (20, 115878), (18, 111490)]

It converts the dictionary a into a list of tuples, sort by tup[1], reverse it and get the first 3 hits.

b-fg
  • 3,959
  • 2
  • 28
  • 44
2

You can do it like so:

dct = {0: 0, 1: 11, 2: 26, 3: 43, 4: 14, 5: 29, 6: 34, 7: 49, 8: 49, 9: 108, 10: 124, 11: 108, 12: 361, 13: 290, 14: 2118, 15: 5408, 16: 43473, 17: 109462, 18: 111490, 19: 244675, 20: 115878, 21: 6960}

res = [next(k for k in dct if dct[k]==v) for v in sorted(dct.values(), reverse=True)[:3]]
print(res)  # -> [19, 20, 18]

Break-down:

  • sorted(dct.values(), reverse=True)[:3]:: Takes the 3 max dictionary values.
  • next(k for k in dct if dct[k]==v):: returns the dictionary key, for which the value is one of the above 3 (iteratively).
Ma0
  • 15,057
  • 4
  • 35
  • 65
2

in two simple steps :

aux = sorted([(v,k) for (k,v) in dic.items()])
res = [(v,k) for (k,v) in aux[-3:]] 
#[(18, 111490), (20, 115878), (19, 244675)]

faster than nlargest and Counter.most_common on this example.

B. M.
  • 18,243
  • 2
  • 35
  • 54
2

This returns what you want:

d = {0: 0, 1: 11, 2: 26, 3: 43, 4: 14, 5: 29, 6: 34, 7: 49, 8: 49, 9: 108, 10: 124, 11: 108, 12: 361, 13: 290, 14: 2118, 15: 5408, 16: 43473, 17: 109462, 18: 111490, 19: 244675, 20: 115878, 21: 6960}

print(sorted([(i,j) for i, j in d.items() if j in (sorted(d.values())[-3:])])[-3:])
#[(18, 111490), (19, 244675), (20, 115878)]
New2coding
  • 715
  • 11
  • 23
1
d = {0: 0, 1: 11, 2: 26, 3: 43, 4: 14, 5: 29, 6: 34, 7: 49, 8: 49, 9: 108, 10: 124, 11: 108, 12: 361, 13: 290, 14: 2118, 15: 5408, 16: 43473, 17: 109462, 18: 111490, 19: 244675, 20: 115878, 21: 6960}

d_items_sorted = sorted(d.items(), key=lambda x: x[1], reverse=True)

d_items_sorted[:3]

Returns :

[(19, 244675), (20, 115878), (18, 111490)]

This is the easiest code I could get, but sorting the dictionary cost O(nlogn) and you should be able to do the same in O(n)

Corentin Limier
  • 4,946
  • 1
  • 13
  • 24
0

Are you looking for the most efficient way or just the optimal way in permormace/algorithm simplicity?

If it's the latter may be you should consider sorting dictionary items as tuples (you can get them with cachedict.items()) like in this answer https://stackoverflow.com/a/613218/10453363

Just sort tuples by the value and then get the last 3 tuples (which are key/value pairs)