1

I am trying to sort the output of a dictionary by the third column, or more specifically the count of values that appear in the example results. For context, the original answer provide was here: https://stackoverflow.com/a/23199901/666891. I was unable to derive an answer from https://stackoverflow.com/a/73050/666891 or https://stackoverflow.com/a/613218/666891 therefore am asking here.

The original out put is produced from:

for k in ret.keys():
    print '%s %d' % (k,ret[k])

Which results in:

google com 1132
akamaiedge net 378
bing com 381
microsoft com 197

And tried:

x = ret.keys()
sorted(x,key=operator.itemgetter(3))
for k in x:
    print '%s %d' % (k,ret[k])

which results in:

google com 1132
akamaiedge net 378
bing com 381
microsoft com 197

And lastly tried:

for k in sorted(ret.keys(),key=operator.itemgetter(3),reverse=True):
    print '%s %d' % (k,ret[k])

Which resulted in a similar output to the first:

microsoft com 197
akamaiedge net 378
google com 1132
bing com 381

Additionally, the value of ret.keys() is:

['google com', 'akamaiedge net', 'bing com', 'microsoft com']

Solution for my specific scenario is:

for k in sorted(ret.keys(), key=lambda k:ret[k], reverse=True):
    print "{:15} - {}".format(k, ret[k])
Community
  • 1
  • 1
Astron
  • 1,211
  • 5
  • 20
  • 42

2 Answers2

1

If you are trying to sort based on the values, then the key parameter should get a function which could give the value corresponding to the current key, like this

d = {'google com':1132,'akamaiedge net':378,'bing com':381,'microsoft com':197}
for key in sorted(d, key=d.get):
    print "{:15} - {}".format(key, d[key])

Output

microsoft com   - 197
akamaiedge net  - 378
bing com        - 381
google com      - 1132

Now, whenever the sorting algorithm picks up a key in the dictionary, it calls the key function, which is the getter function of the dictionary, which will give the value corresponding to the key. So, the value corresponding to the key will be used for the comparison.

Note 1: The problem with your last code is that, you are using operator.itemgetter(3), which will get the element at index 3 in the key. Your keys are strings, so the fourth characters in the keys will be used for comparison. That is why the last example in your question shows

mic*r*osoft com 197
aka*m*aiedge net 378
goo*g*le com 1132
bin*g* com 381

Alphabetically r > m > g.

Note 2: The problem with your second example is that, sorted doesn't change x, it returns a new list. So, you are still using the unsorted x only.

thefourtheye
  • 233,700
  • 52
  • 457
  • 497
1

I'd stick with the solution originally proposed by Burhan Khalid which utilizes collections.Counter:

import re
from collections import Counter

with open('input.txt') as f:
    c = Counter('.'.join(re.findall(r'(\w+\(\d+\))', line.split()[-1])[-2:]) for line in f)

for domain, count in c.most_common():
    print domain, count

It uses most_common() method which essentially sorts the dictionary items by value in a reverse order using sorted(). No need to do it manually.

FYI, here's how most_common() source code looks on python2.7:

def most_common(self, n=None):
    '''List the n most common elements and their counts from the most
    common to the least.  If n is None, then list all element counts.

    >>> Counter('abcdeabcdabcaba').most_common(3)
    [('a', 5), ('b', 4), ('c', 3)]

    '''
    # Emulate Bag.sortedByCount from Smalltalk
    if n is None:
        return sorted(self.iteritems(), key=_itemgetter(1), reverse=True)
    return _heapq.nlargest(n, self.iteritems(), key=_itemgetter(1))
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195