3

Let's say I have a dictionary. I want to find the 4 keys with the highest values. I want to do this in a very basic way. I'm not that advanced with CS. I just want to iterate over the dictionary. Or how should I do this? I realize it cannot be that challenging. I don't want to use heapq. How can I do this?

4 Answers4

3

I think the most pythonic way is:

sorted(d, key=d.get, reverse=True)[:4]
wim
  • 338,267
  • 99
  • 616
  • 750
  • `sorted(d, key=d.get)[-4:]` would be even shorter... – martineau Apr 14 '15 at 00:57
  • I voted this up as definitely better than mine, but I think I would make one amendment nonetheless. I dislike iterating over a `dict`'s keys by simply iterating over the `dict` -- I think that implicit behavior confuses folks all the time. I would prefer as more explicit `sorted(d.keys(), key=d.get ... )` – jwilner Apr 14 '15 at 01:06
  • Using `d.get()` will be less optimal than the `O(n)` solution. Why not just iterate items? The solution used on `collection.Counter.most_common` is `sorted(self.items(), key=itemgetter(1), reverse=True)` – JBernardo Apr 14 '15 at 01:17
  • @jwilner I kind of get what your saying. Can you show me what that amendment would look like? –  Apr 14 '15 at 02:49
  • @Beth, it's just slightly different from the above solutions. `max_keys = sorted(d.keys(), key=d.get)[:-4]` or alternatively, you could reverse if you like -- `sorted(d.keys(), key=d.get, reverse=True)[:4]`. My larger point is that people forget that `list(dict) == d.keys()`, and in those kinds of situations I think programmers should defer to the more explicit option. – jwilner Apr 14 '15 at 02:58
  • 1
    @jwilner The problem with that is that it is wasteful on python 2.x, unnecessarily copying a list in memory which will be garbage collected later. That the dicts iterate over the keys anyway should be common python knowledge, imo. – wim Apr 14 '15 at 03:14
  • @JBernardo What O(n) solution do you speak about? Aren't yours and mine both O(n log n) ? – wim Apr 14 '15 at 03:17
  • @wim I mean the iteration. On worst case each `d.get(x)` can happen on O(n). Most of the time it will be a single hop in the hash table though – JBernardo Apr 14 '15 at 14:57
  • @JBernardo d.get is O(1). It's only going to be O(n) if every hash is colliding - and you have to intentionally sabotage a dict to create that situation, it's not going to just happen by accident. – wim Apr 15 '15 at 00:09
  • Are you sure you know about time complexity and dict implementation? Because this depends heavily on the dictionary history of inserts and deletes. Also the "n" in O(n) for `d.get` is not the current number of elements but the size of the container which is at least 33% bigger to allow less hash collisions. Iterating directly is always more efficient even with no collisions. – JBernardo Apr 15 '15 at 02:23
1

You should be using a collections.Counter object instead.

c = collections.Counter(original_dict)
c.most_common(4)

If you just want the keys, then:

[k for k, v in c.most_common(4)]

For reference on how it is implemented, check the source code here.

JBernardo
  • 32,262
  • 10
  • 90
  • 115
0

Sort by the values and then slice. The sorted function takes a key function.

items_by_value = sorted(d.items(), key=lambda (k, v): v) 
keys = [k for k, v in items_by_value[-4:]]
jwilner
  • 6,348
  • 6
  • 35
  • 47
0

Good question.

Assuming d is the dictionary, and if you don't care what order the keys are in, you could use something like the following:

keys = [v[0] for v in sorted(d.items(), key=lambda v: v[1])[-4:]]

For huge dictionaries, it would be a more efficient to use:

keys = [v[0] for v in sorted(d.items(), key=operator.itemgetter(1))[-4:]]

The [-4:] means the last four entries. In both cases, if you want the keys in the same order as their corresponding highest values, use [-1:-5:-1] instead, which are the last four entries in reverse order.

martineau
  • 119,623
  • 25
  • 170
  • 301