Sort a dictionary by length of the value

Question

I have a dictionary:

d = {'a':[1,3], 'b':[3,4,5,6], 'c':[1], 'd':[1,2,3] }

I want to make a smaller, new dictionary with the top two key:value pairs, sorted by the len of the lists in value. So in this case, I want:

newd = {'b':[3,4,5,6], 'd':[1,2,3] }

I tried this answer but got this error:

NameError: global name 'd' is not defined

Please post exactly how you tried to implement the other solution. It looks *exactly* like what you wanted, so you must have made a mistake when you adapted the code. When I use the other code it works perfectly fine for me. — Felix Kling, Dec 09 '13 at 06:21
@FelixKling I didn't adapt the code. I just copy and pasted from the answer. — mchangun, Dec 09 '13 at 06:26
Well, I did as well and it works for me. See http://codepad.org/RyobPnHY. Did you copy the `>>>` as well by any chance? As it is, this question is just a duplicate. We cannot help you to learn to copy&paste code properly. — Felix Kling, Dec 09 '13 at 06:26
"top two key:value pairs, sorted by the `len` of the lists in value" - you know dicts have no order, right? If you just want the key-value pairs with the top two longest values, you're fine, but if you want the dict to be sorted by value length, you're going to have problems. — user2357112, Dec 09 '13 at 07:36
@user2357112 I don't care about the order in the new dictionary, just that they contain the top n key:values pairs according to length of value. — mchangun, Dec 09 '13 at 08:11

oadams · Accepted Answer · 2013-12-09T06:30:27.460

4

One approach is to sort the items according to length of value, and then create a dictionary from the last two items.

sorted_items = sorted(d.items(), key = lambda item : len(item[1]))
newd = dict(sorted_items[-2:])

edited Dec 09 '13 at 06:30

answered Dec 09 '13 at 06:24

oadams

3,019
6
30
53

1

I'd argue that passing `reverse=true` to `sorted` is more performant than reversing the whole list afterwards. It's probably also helpful if you explain what `[::-1]` means exactly. – Felix Kling Dec 09 '13 at 06:25
Or slice off the last two `dict(sorted_items[-2:])` – John La Rooy Dec 09 '13 at 06:27
@Felix King: It probably is. Adjusted the code to just slice the last two anyway. – oadams Dec 09 '13 at 06:28
@gnibbler, I was making that edit similar to what you suggested, but reversal is irrelevant if you just slice the last two. – oadams Dec 09 '13 at 06:29

score 3 · Answer 2 · answered Dec 09 '13 at 06:24

3

Seems like a job for heapq:

big_items = heapq.nlargest(2, d.items(), key=lambda x: len(x[1]))
newd = dict(big_items)

The advantage of heapq over sorted is that this provides an O(N) time complexity whereas sorted will yield an O(NlogN) time complexity. For small dicts, this probably doesn't matter much. sorted may even be faster due to a more optimized implementation, but for big dicts, this might actually buy you a significant speedup.

(on python2.x, you can use d.iteritems() instead of d.items())

answered Dec 09 '13 at 06:24

mgilson

300,191
65
633
696

+1 although `sorted` is probably _faster_ most of the time. – John La Rooy Dec 09 '13 at 06:26
@gnibbler -- That depends on the size of the dictionary. – mgilson Dec 09 '13 at 06:27
I know about the big-O, but the constant is quite large in this case, so it takes a while before the log overtakes it. – John La Rooy Dec 09 '13 at 06:29
@gnibbler -- Yeah, I added a caveat in there about using `sorted` for smaller dicts. – mgilson Dec 09 '13 at 06:32
Thanks for this answer and showing me something new as well (`heapq`). My dict has 40,000 entries and I want to extract 5,000. I'm guessing this is considering relatively small? – mchangun Dec 09 '13 at 06:46
1

@mchangun -- Not sure. The best thing to do would be to `timeit` to find out. – mgilson Dec 09 '13 at 06:49

Sort a dictionary by length of the value

2 Answers2

Linked