6

Can I accomplish a rank/sort using Counter.most_common() functionality, thus avoiding this line: d = sorted(d.items(), key=lambda x: (-x[1],x[0]), reverse=False) ??

Challenge: You are given a string.The string contains only lowercase English alphabet characters.Your task is to find the top three most common characters in the string.

Output Format: Print the three most common characters along with their occurrence count each on a separate line. Sort output in descending order of occurrence count. If the occurrence count is the same, sort the characters in ascending order.

In completing this I used dict, Counter, and sort in order to ensure "the occurrence count is the same, sort the characters in ascending order". The in-built Python sorted functionality ensures ordering by count, then alphabetical. I'm curious if there is a way to override Counter.most_common() default arbitrary sort/order logic as it seems to disregard the lexicographical order of the results when picking the top 3.

import sys
from collections import Counter

string = sys.stdin.readline().strip()
d = dict(Counter(string).most_common(3))
d = sorted(d.items(), key=lambda x: (-x[1],x[0]), reverse=False)

for letter, count in d[:3]:
    print letter, count
vaultah
  • 44,105
  • 12
  • 114
  • 143
Michael B
  • 568
  • 3
  • 12
  • 1
    `Counter` can count objects as long as they are hashable. But many objects are not orderable, so trying to order them generally doesn't make sense. – Thierry Lathuille Mar 28 '17 at 17:46
  • 1
    The order of what `most_common()` returns is **not** arbitrary. It in the order of the counts, from most common to the least, of the items in it. You could probably [monkey patch](http://stackoverflow.com/questions/5626193/what-is-a-monkey-patch) the module to change this, I suppose, but that seems ill-advised. – martineau Mar 28 '17 at 18:08
  • 2
    The [doc explicitly says `Counter.most_common()`'s tie-breaker order for when counts are equal is arbitrary](https://docs.python.org/3/library/collections.html?highlight=counter#collections.Counter) (cc: @martineau). In cPython they fall back on original insertion order, but don't rely on that because it's not defined behavior. Best to sort, as you say, if you want totally deterministic behavior. – smci Sep 10 '18 at 07:48
  • 2
    https://stackoverflow.com/questions/35446015/ – rocksportrocker Sep 10 '18 at 08:26
  • Another question [How to get the python Counter output ordered by order of inputs?](https://stackoverflow.com/questions/36807701/how-to-get-the-python-counter-output-ordered-by-order-of-inputs). – smci Sep 10 '18 at 09:41

1 Answers1

7

Yes the doc explicitly says Counter.most_common()'s (tie-breaker) order for when counts are equal is arbitrary.

  • UPDATE: PM2Ring told me Counter inherits dict's ordering. The insertion order thing only happens in 3.6+, and is only guaranteed in 3.7. It's possible the doc is lagging.
  • In cPython 3.6+ they fall back on original insertion order (see bottom), but don't rely on that implementation because per the spec, it's not defined behavior. Best to do your own sort, as you say, if you want totally deterministic behavior.
  • I show at bottom how you can monkey-patch Counter.most_common with your own sort function like you show, but that's frowned on. (Code you write might accidentally rely on it and hence break when it wasn't patched.)
  • You could subclass Counter to MyCounter so you can override its most_common. Painful and not really portable.
  • Really the best approach is just to write code and tests that don't rely on the arbitrary tiebreaker order from most_common()
  • I agree that most_common() should not have been hardwired and we should be able to pass a comparison key or sort function into __init__().

Monkey-patching Counter.most_common() :

def patched_most_common(self):
    return sorted(self.items(), key=lambda x: (-x[1],x[0]))

collections.Counter.most_common = patched_most_common

collections.Counter('ccbaab')
Counter({'a': 2, 'b': 2, 'c': 2})

Demonstrating that in cPython 3.7, the arbitrary order is order of insertion (first insertion of each character):

Counter('abccba').most_common()
[('a', 2), ('b', 2), ('c', 2)]

Counter('ccbaab').most_common()
[('c', 2), ('b', 2), ('a', 2)]
smci
  • 32,567
  • 20
  • 113
  • 146
  • Good answer. Can you not reduce complexity by using `heapq` as per the [current implementation](https://stackoverflow.com/questions/29240807/python-collections-counter-most-common-complexity)? – jpp Sep 10 '18 at 10:04
  • @jpp: good point, I wasn't aware Counter maintained a heapq, hence can't accept arbitrary sort-function on-the-fly. But we can still apply whatever O(N log N) sort function we want later. – smci Sep 10 '18 at 10:40
  • Yep, I guess it's worth pointing out the increase in complexity, i.e. it's not a comparable change we're implementing. – jpp Sep 10 '18 at 10:41
  • @jpp: No, Counter can still keep a heapq internally, just we apply whatever O(N log N) sort function we want later. – smci Sep 10 '18 at 11:25
  • Hmm, but I'm specifically talking about `most_common`. Your monkey-patched `most_common` will no longer *use* `heapq`. Or have I misunderstood something? – jpp Sep 10 '18 at 11:27
  • @martineau No I certainly didn't "miss the fact" that it's tagged python-2.7, and that's an unhelpful comment. Python-2.x has been sunsetted a long time now, so for someone even to ask or answer this in 2017, it shouldn't have been tagged python-2.anything, and even if, it would be remiss not to answer with the 3.x behavior, and in particular the very major change [dicts are now ordered by insertion order](https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6) – smci Nov 29 '22 at 21:07