1

The max on collections.Counter is counter intuitive, I want to find the find the character that occurs the most in a string.

>>> from collections import Counter
>>> c = Counter('aaaabbbcc')
>>> max(c)
'c'
>>> c
Counter({'a': 4, 'b': 3, 'c': 2})

I know I should be using most_common, but its use seems contrived.

>>> c.most_common(1)[0][0]
'a'

Is there a case for supporting max on Counter ?

Data Cyclist
  • 21
  • 1
  • 3
  • Does this answer your question: https://stackoverflow.com/questions/1518522/find-the-most-common-element-in-a-list? – Dani Mesejo Nov 24 '21 at 11:04
  • 2
    `max` accepts *any iterable*. `Counter` objects are `dict` objects, Iterating over a dict iterates over its keys. – juanpa.arrivillaga Nov 24 '21 at 11:10
  • I like the counter as it can be easily updated and viewed, but yes that answers solves this specific question. – Data Cyclist Nov 24 '21 at 11:10
  • @DataCyclist that question isn't really relevant to yours, that quesiton was asking how to efficiently get the most common item in a collection of *unhashable items*. You should definitely use a `Counter`, and you can just use `max(Counter(data).items(), key=lambda x:x[-1])` – juanpa.arrivillaga Nov 24 '21 at 11:19
  • @juanpa.arrivillaga this way is actually slower as it requires to build the `items` tuple – mozway Nov 24 '21 at 11:30
  • @mozway it is *definitely faster* than the solution in the linked duplicate – juanpa.arrivillaga Nov 24 '21 at 11:34
  • @juanpa.arrivillaga yes of course, using `count` more than once is already a waste, I was comparing to my answer and OP's original solution ;) – mozway Nov 24 '21 at 11:35
  • @mozway I don't think it will be slower than `max((c := Counter(s)), key=c.get)` it's pretty much doing the exact same thing. In fact, your way forces you to check the dictionary with a hash-based lookup, whereas relying on the built-in iterators is probably faster, but both would be pretty similar – juanpa.arrivillaga Nov 24 '21 at 11:39
  • @juanpa.arrivillaga it's ~1.5-2× slower, that's why I comented ;) (NB. I used `max(c.items(), key=lambda x:x[-1])` vs `max(c, key=c.get)` to compare only the max, not assignment of the counts) – mozway Nov 24 '21 at 11:42

2 Answers2

2

You could use the key parameter of max:

max(c, key=c.get)

output: 'a'

NB. Counter.most_common performs sorting, so using max this way should also be faster (a quick test tells me this is the case on small Counters while there is limited difference on large Counters).

mozway
  • 194,879
  • 13
  • 39
  • 75
0

max with key seems to be faster than most_common

>>> from collections import Counter
>>> import timeit

>>> s0 = 'aaaabbbcc'
>>> s1 = s0[:] * 100

>>> def f_max(s): return max((c := Counter(s)), key=c.get)
>>> def f_common(s): return Counter(s).most_common(1)[0][0]

>>> timeit.repeat("f_max(s1)", "from __main__ import f_max, f_common, s1", number=10000)
[0.32935670800000594, 0.32097511900002473, 0.3285609399999885, 0.3300831690000052, 0.326068628999991]

>>> timeit.repeat("f_common(s1)", "from __main__ import f_max, f_common, s1", number=10000)
[0.3436732490000054, 0.3355550489999928, 0.34284031400000003, 0.343095218000002, 0.34329394300002036]
>>> 
Data Cyclist
  • 21
  • 1
  • 3
  • As mentioned in my answer this is true for a small number of keys, not anymore when there are many keys (which is not the case in your example there are only a/b/c), in this case both are equally fast ;) – mozway Nov 24 '21 at 11:31