Counting occurrences without using collections.Counter

Question

I am trying to retrieve the most frequent and less frequent elements in a list.

frequency([13,12,11,13,14,13,7,11,13,14,12,14,14])

My output is:

([7], [13, 14])

I tried it with:

import collections
s = [13,12,11,13,14,13,7,11,13,14,12,14,14]
count = collections.Counter(s)
mins = [a for a, b in count.items() if b == min(count.values())]
maxes = [a for a, b in count.items() if b == max(count.values())]
final_vals = [mins, maxes]

But I don't want to use the collections module and try a more logic oriented solution.
Can you please help me to do it without collections?

For "max occuring element" (aka. _mode_), have a look at this question: [Finding the mode of a list](https://stackoverflow.com/questions/10797819/finding-the-mode-of-a-list). — Christian König, Aug 30 '17 at 08:46
Possible duplicate of [Finding the mode of a list](https://stackoverflow.com/questions/10797819/finding-the-mode-of-a-list) — araknoid, Aug 30 '17 at 08:49

MSeifert · Answer 1 · 2017-08-30T09:11:37.040

You could use a try and except approach with a dict to emulate a Counter.

def counter(it):
    counts = {}
    for item in it:
        try:
            counts[item] += 1
        except KeyError:
            counts[item] = 1
    return counts

or alternativly you can use dict.get with a default of 0:

def counter(it):
    counts = {}
    for item in it:
        counts[item] = counts.get(item, 0) + 1
    return counts

And you should do the min() and max() outside the comprehensions to avoid repeatedly calculating that quantity (the function is now O(n) instead of O(n^2):

def minimum_and_maximum_frequency(cnts):
    min_ = min(cnts.values())
    max_ = max(cnts.values())
    min_items = [k for k, cnt in cnts.items() if cnt == min_]
    max_items = [k for k, cnt in cnts.items() if cnt == max_]
    return min_items, max_items

This would work as expected then:

>>> minimum_and_maximum_frequency(counter([13,12,11,13,14,13,7,11,13,14,12,14,14]))
([7], [13, 14])

score 0 · Answer 2 · answered Aug 30 '17 at 08:46

data = [13,12,11,13,14,13,7,11,13,14,12,14,14]
occurrences = {}
for i in data:
     if i in occurrences.keys():
             occurrences[i] += 1
     else:
             occurrences[i] = 1

max_vals = [i for i in occurrences.keys() if occurrences[i] == max(occurrences.values())]
min_vals = [i for i in occurrences.keys() if occurrences[i] == min(occurrences.values())]

score 0 · Answer 3 · answered Aug 30 '17 at 08:55

s = [13,12,11,13,14,13,7,11,13,14,12,14,14]

occurrences = dict()

for item in s:

    occurrences[item] = occurrences.setdefault(item, 0) + 1

mins = [a for a, b in occurrences.items() if b == min(occurrences.values())]
maxs = [a for a, b in occurrences.items() if b == max(occurrences.values())]

final_vals = [mins, maxs]

The defaultdict from collections is more suitable to replace Counter to calculate occurrences of items in list. But because you limit the use of collections. So the setdefault is more elegant to deal with KeyError.

dummman · Answer 4 · 2017-08-30T09:13:43.070

0

how about this:

s = [13,12,11,13,14,13,7,11,13,14,12,14,14]
freq_low = s.count(min(s, key=s.count))
freq_high = s.count(max(s, key=s.count))
mins = [a for a in set(s) if s.count(a) == freq_low]
maxes = [a for a in set(s) if s.count(a) == freq_high]
final_vals = [mins, maxes]

edited Aug 30 '17 at 09:13

answered Aug 30 '17 at 09:10

dummman

189
5

1

That's `O(n^4)` (or maybe just `O(n^3)`) whereas the problem can be solved easily in `O(n)`. – MSeifert Aug 30 '17 at 09:12

gboffi · Answer 5 · 2017-08-30T10:17:08.263

This is a very simple solution, possibly not the most efficient one (?) but a simple one.

data = get_data()
freqs, numbers = {}, {}
for i in data:
    freqs[i] = freqs.get(i, 0) + 1
for n, c in freqs.items():
    numbers[c] = numbers.get(c, []) + [n]
counts = list(numbers.keys())
res = (numbers[min(counts)], numbers[max(counts)])

Let's see in detail what we have in the script above, let's start with the example data you gave,

In [1]: data = [13,12,11,13,14,13,7,11,13,14,12,14,14]

We are going to use two dictionaries,

In [2]: freqs, numbers = {}, {}

the first one is filled iterating on data, its keys are the individual numbers in data and its values the frequency of each number in data (see fotnote for freqs.get(…))

In [3]: for i in data: freqs[i] = freqs.get(i, 0) + 1

the second one is simply a reversal of the first one, the keys being the frequencies and the values being lists of numbers with a given frequency.

In [4]: for n, c in freqs.items(): numbers[c] = numbers.get(c, []) + [n]

In [5]: numbers
Out[5]: {1: [7], 2: [12, 11], 4: [13, 14]}

At this point we need a list with the keys of numbers, i.e., the occurrences

In [6]: counts = list(numbers.keys())

because we are interested in the minimum and maximum values of the occurrences

In [7]: [numbers[min(counts)], numbers[max(counts)]]
Out[7]: [[7], [13, 14]]

Footnote: the .get(key, default_value) method of a dictionary returns a default value if the key is not present in the dictionary, we used this feature with a 0 default value to sum the occurrences of individual numbers, and with [], the void list, to build a list of all the numbers with given frequency.

Maybe not the fastest one, but every step (counting, lumping and finding the extremes of keys) is `O(n)`, so the total complexity is still `O(n)` … — gboffi, Aug 30 '17 at 09:46

Counting occurrences without using collections.Counter

5 Answers5

Linked

Related