Sorting Counter entries by value, then by key

Question

I was trying to sort few values in list using Python's Counter from collection module. But it gives weird result when

>>> diff=["aaa","aa","a"]
>>> c=Counter(diff)
>>> sorted(c.items(), key = lambda x:x[1] , reverse=True)
[('aa', 1), ('a', 1), ('aaa', 1)]
>>> c.items()
[('aa', 1), ('a', 1), ('aaa', 1)]

Output is strange, as it seems to have shuffle 'aa' to the first place, then 'a' and 'aaa' at last. Ideally, it should have been 'a' then 'aa' then 'aaa'

What is the reason behind this and how would you rectify the same

Edit: Most people understand the question incorrectly, Hence I am pushing some clarifications. The goal is to sort the number of words in list based on it's occurances.

Let's say list diff = ["this", "this", "world", "cool", "is", "cool", "cool"]. The final output by my above code would be cool then this then is then world which is correct.

but problem is when you supply same characters with same occurences, python misbehaves. As the Input is diff = ["aaa", "aa", "a"] , I expected output to be a then aa then aaa . But python algorithm would never know as every word occurred single time.

But if that is the case, then why did python didn't printed aaa then aa then a (i.e in same order it was inputted) giving benefit of doubt. Python sort did actually swapped . WHY?

Possible duplicate of [Python dictionary, how to keep keys/values in same order as declared?](https://stackoverflow.com/questions/1867861/python-dictionary-how-to-keep-keys-values-in-same-order-as-declared) — smac89, Mar 04 '18 at 00:28
@avigil - I am sorting based on the highest repeat of the word. :) — user8877134, Mar 04 '18 at 00:34
You *told* `sorted()` to consider ONLY the second element of each tuple. The items were already sorted by that criterion (1 >= 1 >= 1), so no change was made to the (arbitrary) order in which the dictionary provided the items. — jasonharper, Mar 04 '18 at 00:35
Let's say your Input is `diff = ["this", "world", "this", "is", "beautiful"]` the output would be accurate as this repeated 2 times and after than the priority follows based on alphabetical letters — user8877134, Mar 04 '18 at 00:35
@jasonharper - Change has been made. Look the Input is `['aaa', 'aa', 'a']` .. The output is `aa` then `a` then `aaa` . So how did python decide to swap the values in the list. Shouldn't it display `aaa` then `aa` then `a` giving python the benefit of doubt — user8877134, Mar 04 '18 at 00:37
`sorted(c.items())` creates a sorted version of the iterator returned by `items()` and returns a sorted list which in your code is being discarded. It does not sort the original `Counter` object, which is an unordered data structure and wouldn't make much sense anyway. — avigil, Mar 04 '18 at 00:39

PrasadK · Answer 1 · 2018-03-04T00:35:09.987

2

Counter is a subclass of dict. It is an unordered collection.

The get the sorting order you want, you can update your code like -

sorted(c.items(), key = lambda x:(x[1], -len(x[0])) , reverse=True)

This gives -

[('a', 1), ('aa', 1), ('aaa', 1)]

edited Mar 04 '18 at 00:35

answered Mar 04 '18 at 00:28

PrasadK

778
6
17

But I am applying `sorted` function thereafter on the counted values based on the `x[1]` which is the amount of count a word is repeated. So that shouldn't be the issue – user8877134 Mar 04 '18 at 00:33

avigil · Answer 2 · 2018-03-04T00:48:23.693

2

sorted does a stable sort. That means for ties, the order of items will be the same as the order they appear in the original input. Since your Counter is unordered, the input to sorted is in some undefined order. If you want you can sort by the key, and then the value:

sorted(sorted(c.items(), key=lambda x:x[0], reverse=True), key = lambda x:x[1] , reverse=True)

Or (probably better) have your sort function return a tuple as the sort key:

sorted(c.items(), key=lambda x:(x[1], x[0]), reverse=True)

An (even better!) version utilizing operator.itemgetter:

sorted(c.items(), key=itemgetter(1,0), reverse=True)

edited Mar 04 '18 at 00:48

answered Mar 04 '18 at 00:44

avigil

2,218
11
18

3

Even better would be to use [operator.itemgetter](https://docs.python.org/3/library/operator.html#operator.itemgetter): `key=itemgetter(1, 0)`. – ekhumoro Mar 04 '18 at 00:48

score 0 · Answer 3 · answered Mar 04 '18 at 00:36

0

Here's one way you can ensure your ordering remains unchanged.

As previously mentioned dictionaries are not deemed to be ordered. The result will be a sorted list of tuples.

from collections import Counter

diff = ["aaa", "aa", "a"]

c = Counter(diff)

sorted(c.items(), key=lambda x: diff.index(x[0]))

# [('aaa', 1), ('aa', 1), ('a', 1)]

answered Mar 04 '18 at 00:36

jpp

159,742
34
281
339

OP: "The goal is to sort the number of words in list based on it's occurances ". – ekhumoro Mar 04 '18 at 01:30
@ekhumoro, User achieved his goal in his OP, so this does not seem to be so. – jpp Mar 04 '18 at 10:30

Sorting Counter entries by value, then by key

3 Answers3