Basic way to find largest values in dictionary

Question

Let's say I have a dictionary. I want to find the 4 keys with the highest values. I want to do this in a very basic way. I'm not that advanced with CS. I just want to iterate over the dictionary. Or how should I do this? I realize it cannot be that challenging. I don't want to use heapq. How can I do this?

What do you mean by highest values? Highest Number? Biggest length? — Zizouz212, Apr 14 '15 at 00:44
http://stackoverflow.com/questions/1555968/efficient-way-to-find-the-largest-key-in-a-dictionary-with-non-zero-value — NYB, Apr 14 '15 at 00:45
@martineau I just need it to be done in the most basic manner possible — , Apr 14 '15 at 02:51
@Zizouz212 Highest value, as in the keys with the largest integer values. — , Apr 14 '15 at 02:52

score 3 · Answer 1 · answered Apr 14 '15 at 00:45

3

I think the most pythonic way is:

sorted(d, key=d.get, reverse=True)[:4]

answered Apr 14 '15 at 00:45

wim

338,267
99
616
750

`sorted(d, key=d.get)[-4:]` would be even shorter... – martineau Apr 14 '15 at 00:57
I voted this up as definitely better than mine, but I think I would make one amendment nonetheless. I dislike iterating over a `dict`'s keys by simply iterating over the `dict` -- I think that implicit behavior confuses folks all the time. I would prefer as more explicit `sorted(d.keys(), key=d.get ... )` – jwilner Apr 14 '15 at 01:06
Using `d.get()` will be less optimal than the `O(n)` solution. Why not just iterate items? The solution used on `collection.Counter.most_common` is `sorted(self.items(), key=itemgetter(1), reverse=True)` – JBernardo Apr 14 '15 at 01:17
@jwilner I kind of get what your saying. Can you show me what that amendment would look like? – Apr 14 '15 at 02:49
@Beth, it's just slightly different from the above solutions. `max_keys = sorted(d.keys(), key=d.get)[:-4]` or alternatively, you could reverse if you like -- `sorted(d.keys(), key=d.get, reverse=True)[:4]`. My larger point is that people forget that `list(dict) == d.keys()`, and in those kinds of situations I think programmers should defer to the more explicit option. – jwilner Apr 14 '15 at 02:58
1

@jwilner The problem with that is that it is wasteful on python 2.x, unnecessarily copying a list in memory which will be garbage collected later. That the dicts iterate over the keys anyway should be common python knowledge, imo. – wim Apr 14 '15 at 03:14
@JBernardo What O(n) solution do you speak about? Aren't yours and mine both O(n log n) ? – wim Apr 14 '15 at 03:17
@wim I mean the iteration. On worst case each `d.get(x)` can happen on O(n). Most of the time it will be a single hop in the hash table though – JBernardo Apr 14 '15 at 14:57
@JBernardo d.get is O(1). It's only going to be O(n) if every hash is colliding - and you have to intentionally sabotage a dict to create that situation, it's not going to just happen by accident. – wim Apr 15 '15 at 00:09
Are you sure you know about time complexity and dict implementation? Because this depends heavily on the dictionary history of inserts and deletes. Also the "n" in O(n) for `d.get` is not the current number of elements but the size of the container which is at least 33% bigger to allow less hash collisions. Iterating directly is always more efficient even with no collisions. – JBernardo Apr 15 '15 at 02:23

JBernardo · Answer 2 · 2015-04-14T01:24:39.893

1

You should be using a collections.Counter object instead.

c = collections.Counter(original_dict)
c.most_common(4)

If you just want the keys, then:

[k for k, v in c.most_common(4)]

For reference on how it is implemented, check the source code here.

edited Apr 14 '15 at 01:24

answered Apr 14 '15 at 01:09

JBernardo

32,262
10
90
115

@Shashank you can use `Counter` to count anything. Try the code yourself. Also this is much cleaner – JBernardo Apr 14 '15 at 01:13
Taken back, it works :) – Shashank Apr 14 '15 at 01:15
Good trick! It uses heapq in the background – wim Apr 14 '15 at 02:53

score 0 · Answer 3 · answered Apr 14 '15 at 00:46

0

Sort by the values and then slice. The sorted function takes a key function.

items_by_value = sorted(d.items(), key=lambda (k, v): v) 
keys = [k for k, v in items_by_value[-4:]]

answered Apr 14 '15 at 00:46

jwilner

6,348
6
35
47

score 0 · Answer 4 · answered Apr 14 '15 at 11:54

Good question.

Assuming d is the dictionary, and if you don't care what order the keys are in, you could use something like the following:

keys = [v[0] for v in sorted(d.items(), key=lambda v: v[1])[-4:]]

For huge dictionaries, it would be a more efficient to use:

keys = [v[0] for v in sorted(d.items(), key=operator.itemgetter(1))[-4:]]

The [-4:] means the last four entries. In both cases, if you want the keys in the same order as their corresponding highest values, use [-1:-5:-1] instead, which are the last four entries in reverse order.

Basic way to find largest values in dictionary

4 Answers4