Trying to make my own implementation for the mode of a data set in python

Question

I'm fully aware of Counter.most_commen but this feels like cheating to me. I wanna do it myself.

Here is my function.

    def mode(self):
        unq = []
        m = 0
        for i in self.arrData:
            if i not in unq:
                unq.append(i)
        for i in unq:
            count = self.arrData.count(i)
            if count > m:
                m = i
        return m

When using the test data:

34.9, 35.0, 35.2, 35.4, 35.8, 36.0, 36.1, 36.2, 36.3, 36.4, 36.4, 36.4, 36.4, 36.5, 36.6, 36.7, 36.7, 36.8, 36.8, 37.0, 37.2, 37.3, 37.9, 38.2, 38.3, 38.3, 38.4, 38.8, 39.0, 39.4

I keep getting the fist element as m.

Using Standard Library routines is in no way cheating. Gaining understanding of Python techniques by the "challenge" of rewriting standard routines could be legitimate, however. — Joffan, May 13 '21 at 19:29

metaperture · Accepted Answer · 2021-05-13T19:42:13.607

You need to maintain two variables -- the current mode, and the count of the current mode. You're currently comparing the "count" versus the "mode", when you should be comparing the count versus the count of the mode.

    def mode(self):
        uniq = set() # set is better than lists for this
        mode = None
        mode_count = 0
        for i in self.arrData:
            uniq.add(i) # don't need to check for membership with sets
        for i in uniq:
            i_count = self.arrData.count(i)
            if i_count > mode_count:
                mode = i
                mode_count = i_count
        return mode # will return None for an empty array

To do this in one pass (less run time):

    def mode(self):
        seen = set() # set is better than lists for this; checking membership is cheaper
        mode = None
        mode_count = 0
        for i in self.arrData:
            if i in seen:
                continue
            seen.add(i)
            i_count = self.arrData.count(i)
            if i_count > mode_count:
                mode = i
                mode_count = i_count
        return mode # will return None for an empty array

But this also hides an O(n) scan in the arrData.count(). To avoid that:

    def mode(self):
        value_counts = defaultdict(int)
        for i in self.arrData:
            value_counts[i] += 1
        # equivalently: value_counts = Counter(self.arrData)
        mode = None
        mode_count = 0
        for i, i_count in value_counts.items():
            if i_count > mode_count:
                mode = i
                mode_count = i_count
        return mode # will return None for an empty array

Alternatively, use scipy.stats.mode (see Most efficient way to find mode in numpy array). Note that if your data is continuous (as is often the case with floats), you probably want some sort of kde instead of a mode (otherwise you are implicitly privileging the precision of the data as the quantization bin size, when perhaps a different bin size / bandwidth is more sensible for your data).

Oh, ok! I see. Pretty dumb mistake actually. Thank you for the help. — Christopher, May 13 '21 at 19:42

score 0 · Answer 2 · answered May 13 '21 at 19:38

you save the most common value in m, and not the count of it. you can fix it by this code:

def mode(self):
    unq = []
    m = 0
    c = 0
    for i in self.arrData:
        if i not in unq:
            unq.append(i)
    for i in unq:
        count = self.arrData.count(i)
        if count > c:
            m = i
            c = count
    return m

Trying to make my own implementation for the mode of a data set in python

2 Answers2