max on collections.Counter

Question

The max on collections.Counter is counter intuitive, I want to find the find the character that occurs the most in a string.

>>> from collections import Counter
>>> c = Counter('aaaabbbcc')
>>> max(c)
'c'
>>> c
Counter({'a': 4, 'b': 3, 'c': 2})

I know I should be using most_common, but its use seems contrived.

>>> c.most_common(1)[0][0]
'a'

Is there a case for supporting max on Counter ?

Does this answer your question: https://stackoverflow.com/questions/1518522/find-the-most-common-element-in-a-list? — Dani Mesejo, Nov 24 '21 at 11:04
`max` accepts *any iterable*. `Counter` objects are `dict` objects, Iterating over a dict iterates over its keys. — juanpa.arrivillaga, Nov 24 '21 at 11:10
I like the counter as it can be easily updated and viewed, but yes that answers solves this specific question. — Data Cyclist, Nov 24 '21 at 11:10
@DataCyclist that question isn't really relevant to yours, that quesiton was asking how to efficiently get the most common item in a collection of *unhashable items*. You should definitely use a `Counter`, and you can just use `max(Counter(data).items(), key=lambda x:x[-1])` — juanpa.arrivillaga, Nov 24 '21 at 11:19
@juanpa.arrivillaga this way is actually slower as it requires to build the `items` tuple — mozway, Nov 24 '21 at 11:30
@mozway it is *definitely faster* than the solution in the linked duplicate — juanpa.arrivillaga, Nov 24 '21 at 11:34
@juanpa.arrivillaga yes of course, using `count` more than once is already a waste, I was comparing to my answer and OP's original solution ;) — mozway, Nov 24 '21 at 11:35
@mozway I don't think it will be slower than `max((c := Counter(s)), key=c.get)` it's pretty much doing the exact same thing. In fact, your way forces you to check the dictionary with a hash-based lookup, whereas relying on the built-in iterators is probably faster, but both would be pretty similar — juanpa.arrivillaga, Nov 24 '21 at 11:39
@juanpa.arrivillaga it's ~1.5-2× slower, that's why I comented ;) (NB. I used `max(c.items(), key=lambda x:x[-1])` vs `max(c, key=c.get)` to compare only the max, not assignment of the counts) — mozway, Nov 24 '21 at 11:42

score 2 · Answer 1 · answered Nov 24 '21 at 11:03

2

You could use the key parameter of max:

max(c, key=c.get)

output: 'a'

NB. Counter.most_common performs sorting, so using max this way should also be faster (a quick test tells me this is the case on small Counters while there is limited difference on large Counters).

answered Nov 24 '21 at 11:03

mozway

194,879
13
39
75

This is partially incorrect most_common only sorts when `n is None` otherwise it uses a heap, but the heap documentation suggest using max when `n = 1` – Dani Mesejo Nov 24 '21 at 11:10
2

@Dani that's true, thanks for the comment ;) – mozway Nov 24 '21 at 11:33

score 0 · Answer 2 · answered Nov 24 '21 at 11:29

max with key seems to be faster than most_common

>>> from collections import Counter
>>> import timeit

>>> s0 = 'aaaabbbcc'
>>> s1 = s0[:] * 100

>>> def f_max(s): return max((c := Counter(s)), key=c.get)
>>> def f_common(s): return Counter(s).most_common(1)[0][0]

>>> timeit.repeat("f_max(s1)", "from __main__ import f_max, f_common, s1", number=10000)
[0.32935670800000594, 0.32097511900002473, 0.3285609399999885, 0.3300831690000052, 0.326068628999991]

>>> timeit.repeat("f_common(s1)", "from __main__ import f_max, f_common, s1", number=10000)
[0.3436732490000054, 0.3355550489999928, 0.34284031400000003, 0.343095218000002, 0.34329394300002036]
>>>

As mentioned in my answer this is true for a small number of keys, not anymore when there are many keys (which is not the case in your example there are only a/b/c), in this case both are equally fast ;) — mozway, Nov 24 '21 at 11:31

max on collections.Counter

2 Answers2