Finding mode of a list

Question

I'm writing a function that calculates the mode or modes of a list of numbers.

If input is [52, 99, 37, 86, 99, 99, 99, 37, 37, 37], output should be [37, 99]. As you can see smaller number should come first, but my code won't do it. Can someone fix my code?

def mode(L):
    most = max(list(map(L.count, L)))
    return list(set(filter(lambda x: L.count(x) == most, L)))

But set is unordered. Why do you think the result _should_ be ordered? — Useless, Jul 21 '19 at 13:35
You are loosing the order with set. Try `list(set(["b", "a"]))` to see it. — Alexandre B., Jul 21 '19 at 13:36
Use `Counter.most_common`, this is going to be way more efficient — Olivier Melançon, Jul 21 '19 at 13:38
@Alex I got `['b', 'a']` back. I think the order is implementation-dependent. — wjandrea, Jul 21 '19 at 14:03
In case someone Googles this in the future, once Python 3.8 is out the answer becomes `import statistics; sorted(statistics.multimode(L))` — Steven Rumbalski, Jul 21 '19 at 14:06
Sidenote, the `list()` in `max(list(map(L.count, L)))` is not needed. — wjandrea, Jul 21 '19 at 14:09
Related: [Finding the mode of a list](https://stackoverflow.com/q/10797819/4518341) - but not a duplicate since OP is asking about sorting — wjandrea, Jul 21 '19 at 18:31

abdusco · Answer 1 · 2019-07-21T13:53:31.603

5

An alternative solution is to use collections.Counter

from collections import Counter

nums = [52, 99, 37, 86, 99, 99, 99, 37, 37, 37]

c = Counter(nums)
highest_freq = max(c.values())
mod = [n for n, freq in sorted(c.items()) if freq == highest_freq]

print(mod)

output:

[37, 99]

If you need only one item, you could also just use:

nums = [52, 99, 37, 86, 99, 99, 99, 37, 37, 37]
c = Counter(nums)
print(max(c))

which prints:

edited Jul 21 '19 at 13:53

answered Jul 21 '19 at 13:39

abdusco

9,700
2
27
44

`Counter.most_common()` returns an iterator of items from highest to lowest counts. – Steven Rumbalski Jul 21 '19 at 13:51
True, but you need to specify how many you want. Selecting the most common ones (however many) requires manual iteration. – abdusco Jul 21 '19 at 13:51
Nope. See @WillemVanOnsem's answer. I independently (and more slowly) came up with the same. – Steven Rumbalski Jul 21 '19 at 13:54
Still, `groupby` iterates through the list. I meant there's no straightforward solution without filtering. – abdusco Jul 21 '19 at 13:55
1

No. `groupby` creates an iterator. It does not do the iteration. When `next()` is called it iterates and calls the key function on the items until the return value of the key function of the iterator changes (or the iterator is exhausted). So here in order to create the first grouping of two it consumes the first three items of `.most_common()`. Because the count is different on the third item it groups the first two items and returns them as a group. No further iteration of the iterator returned by `.most_common()` is done. – Steven Rumbalski Jul 21 '19 at 14:01
Ah I see what you mean, thanks for the clarification :) – abdusco Jul 21 '19 at 14:02
`.most_common()` just does `return sorted(self.items(), key=_itemgetter(1), reverse=True)` so I think my point is an empty one. – Steven Rumbalski Jul 21 '19 at 14:22

vekerdyb · Accepted Answer · 2019-07-21T15:17:13.233

3

sorted() sorts your list.

def mode(L):
    most = max(list(map(L.count, L)))
    return sorted(list(set(filter(lambda x: L.count(x) == most, L))))

Update
Note: This is a very inefficient way of calculating the mode. There are more performant solutions in other answers. This answer is focuses narrowly on what OP asked. Do not use this code in production.
Please also see notes in the comments on other improvements of this code.

edited Jul 21 '19 at 15:17

answered Jul 21 '19 at 13:35

vekerdyb

1,213
12
26

1

You can get rid of `list()` on both lines. They're not doing anything for you. `max()` will consume the `map()` iterator and `sorted()` will happily consume the `set`. – Steven Rumbalski Jul 21 '19 at 14:37
1

Please note that the performance of this is *awful*. Consider to do `list(map(L.count, L))` on a 10 item list `L.count()` is called 10 times and has to iterate over 10 items to make the count. That's 100 (10*10) visits for 10 items. 10:1. But if the list is 100 items long its 10,000 (100*100) visits. 1000:1. So by growing the list by a factor of 10 the expense of iteration grew by a factor of 100. That's quadratic growth. On my computer once L becomes 10,000 items it becomes perceptively slow, while the other answers still proceed zippily along. – Steven Rumbalski Jul 21 '19 at 14:47
@StevenRumbalski I agree with both of your comments -- MatthewLee you might want to consider both. However my answer was focused on the specific issue MatthewLee asked about. I have now included a note in my answer for future readers. – vekerdyb Jul 21 '19 at 15:19

score 3 · Answer 3 · answered Jul 21 '19 at 13:41

You here make it computationally quite expensive. A .count(..) takes linear time, making this algorithm quadratic.

You can make use of a Counter here to perform a single pass over the list, and then obtain the most common elements, like:

from collections import Counter
from operator import itemgetter
from itertools import groupby

def mode(L):
    _, common = next(groupby(Counter(L).most_common(), itemgetter(1)))
    return sorted(map(itemgetter(0), common))

Given the elements in the list can be hashed effectively, this will run in linear time.

In case someone Googles this in the future, once Python 3.8 is out the answer becomes `import statistics; sorted(statistics.multimode(L))` — Steven Rumbalski, Jul 21 '19 at 13:56

Finding mode of a list

3 Answers3