1

I am trying to sort the list of items below by the frequency of the individual items. With the largest first, to the smallest last. I wanted to see if I can do it first with sorted() before trying with collection.Counter.

Code I have so far is;

items = [4, 6, 2, 2, 6, 4, 4, 4]
x = sorted(items, key=items.count, reverse=True)
print(x)

The above code prints; [4, 4, 4, 4, 6, 2, 2, 6] Rather than; [4, 4, 4, 4, 6, 6, 2, 2]

Could someone please explain why it doesn't go "6,6,2,2"?

deceze
  • 510,633
  • 85
  • 743
  • 889
Undead2k
  • 21
  • 3
  • 10
    `6` and `2` both occur twice, so the sort value for all `6` and `2` is "2", meaning they're all equal as far as the sorting algorithm is concerned and their relative order is undefined and/or won't change. – deceze Aug 05 '20 at 10:45
  • 1
    Isn't this doing a stable sorting? – Aparna Aug 05 '20 at 10:46
  • 1
    This is because Python's sort is [stable](https://stackoverflow.com/questions/1517793/what-is-stability-in-sorting-algorithms-and-why-is-it-important) – Torben545 Aug 05 '20 at 10:47
  • 1
    This seems to be the duplicate of https://stackoverflow.com/questions/2290962/python-how-to-get-sorted-count-of-items-in-a-list – Shrikant Kakani Aug 05 '20 at 10:50
  • 2
    @deceze has answered the question but jfyi you could achieve what you want by doing this: `x = sorted(sorted(items, reverse=True), key=items.count, reverse=True)`, i.e. pre-sort to "group" same numbers. – Peaceful James Aug 05 '20 at 10:51
  • @deceze; Thanks for the explaining! @PeacefulJames: Also thanks for the example code, didn't think to pre-sort it and then do the counts, interesting :D – Undead2k Aug 05 '20 at 11:17

2 Answers2

4

The reason it does this is best explained by the documentation for the function:

The built-in sorted() function is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal — this is helpful for sorting in multiple passes (for example, sort by department, then by salary grade).

https://docs.python.org/3/library/functions.html#sorted

chrisharrel
  • 337
  • 2
  • 10
2

I would slightly modify the key parameter:

result = sorted(items, key=lambda x: (items.count(x), x), reverse=True)

By the way, you should really use Counter as it would be much much faster! The complexity of your approach is O(n^2log(n))! Instead, it could be the usual O(nlog(n))...

Riccardo Bucco
  • 13,980
  • 4
  • 22
  • 50
  • 1
    The OP is already aware of `Counter`. He's interested in knowing why his other approach is not working as expected. – Shiva Aug 05 '20 at 10:51
  • @Shiva Indeed my answer does not use `Counter`. But I strongly strongly strongly suggest the OP to use another approach, since the complexity is TERRIBLE. – Riccardo Bucco Aug 05 '20 at 10:52
  • 1
    Haha, I was literally in the middle of posting almost the exact same answer. Beat me to it, good job! – Elliptica Aug 05 '20 at 10:53
  • 1
    I didn't downvote you, but you didn't address the question: `Could someone please explain why it doesn't go "6,6,2,2"?` Might be why. – chrisharrel Aug 05 '20 at 10:56
  • 1
    @mobiusxs Thanks! You answer is indeed great and it perfectly address the problem. Also, it should be the accepted answer. Hope my answer just provide some more interesting details to yours. – Riccardo Bucco Aug 05 '20 at 10:57
  • Thank you for the explanations and example code for if I were to go with `sorted()`, it is helpful. Can I ask what is happening inside the `(items.count(x), x)` part please? I sorta understand lambda's and noticed if you remove "`,x`" you get the same result as before, so I'm curious to know why this makes it different. If I had enough reputation I would upvote! :) – Undead2k Aug 05 '20 at 11:14
  • 2
    @Undead2k Sorting algorithms work using comparisons (like `x < y`). In your case, comparing two counts is not enough. What's the element that comes first when they share the same frequency? Thats why I compare tuples instead of single numbers. When you compare tuples your first compare the first element, and if its the same then you compare the second one etc. For example: `(3,2) < (4,2)` (because of the first element), `(3,2) < (3,7)` (because of the second element. To sum up, the second element is needed to disambiguate the cases in which the first element is the same. – Riccardo Bucco Aug 05 '20 at 11:24