2

scipy.stats.mode works great, but I need to break modal ties at random.

import numpy as np
import scipy.stats as stats

a = np.array([[3, 3, 4], 
              [3, 1, 0], 
              [4, 5, 0], 
              [4, 3, 0]])

stats.mode(a, axis=0)

Out[37]: ModeResult(mode=array([[3, 3, 0]]), count=array([[2, 2, 3]]))

For the first result (column), scipy.stats.mode chooses 3 among the tied candidates 3 and 4, as follows:

If there is more than one such value, only the smallest is returned.

So among 3 and 4, it picks 3 because it's the smallest. I'd like to randomly choose among 3 and 4, but scipy.stats.mode doesn't bring back enough information to allow me to do that. Is there a good way to do this using numpy or a decent alternative?

Julian Drago
  • 719
  • 9
  • 23

3 Answers3

2

For a performant approach, here's a numba alternative:

from numba import njit, int32

@njit
def mode_rand_ties(a):
    out = np.zeros(a.shape[1], dtype=int32)
    for col in range(a.shape[1]):
        z = np.zeros(a[:,col].max()+1, dtype=int32)
        for v in a[:,col]:
            z[v]+=1
        maxs = np.where(z == z.max())[0]
        out[col] = np.random.choice(maxs)
    return out

Where testing for the array above, by running more than once we see that we can get either 3 or 4 as the mode of the first column:

mode_rand_ties(a)
# array([4, 3, 0], dtype=int32)

mode_rand_ties(a)
# array([3, 3, 0], dtype=int32)

And by checking the performance on a (4000, 3) shaped array we get that it takes only about 40us:

x = np.concatenate([a]*1000, axis=0)
%timeit mode_rand_ties(x)
# 41.1 µs ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Whereas with the current solution:

%timeit mode_rand(x, axis=0)
# 388 µs ± 23.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
yatu
  • 86,083
  • 12
  • 84
  • 139
0

I'm still taking answers if anyone comes up with a better way, but this is my long-winded temporary solution, which just bogarts the scipy.stats.mode source code. The only significant modification is in the for ind in inds loop, where I use np.where to bring back all indices that have the same number of maximum counts, and I randomly choose the index from that.

from collections import namedtuple
ModeResult = namedtuple('ModeResult', ('mode', 'count'))
def mode_rand(a, axis):
    in_dims = list(range(a.ndim))
    a_view = np.transpose(a, in_dims[:axis] + in_dims[axis+1:] + [axis])

    inds = np.ndindex(a_view.shape[:-1])
    modes = np.empty(a_view.shape[:-1], dtype=a.dtype)
    counts = np.zeros(a_view.shape[:-1], dtype=np.int)

    for ind in inds:
        vals, cnts = np.unique(a_view[ind], return_counts=True)
        maxes = np.where(cnts == cnts.max())  # Here's the change
        modes[ind], counts[ind] = vals[np.random.choice(maxes[0])], cnts.max()

    newshape = list(a.shape)
    newshape[axis] = 1
    return ModeResult(modes.reshape(newshape), counts.reshape(newshape))

mode_rand(a, axis=0)
Julian Drago
  • 719
  • 9
  • 23
0

Here's a solution, but for a single-dimension array/list. (you could generalize this for multiple dimensions of course).

import numpy as np

def mode_rand_ties(a):
    uniq, cnts = np.unique(a, return_counts=True)
    max_cnt = np.max(cnts)
    ties = uniq[cnts==max_cnt]
    return np.random.choice(ties), max_cnt

a = [3,3,5,4,4,2,1,10,10]

print(mode_rand_ties(a))

Output:

(10, 2)

Or, if your use-case is for positive integers and would only ever see a 2-way tie (this was my original use-case and meets with your example), you could negate a, taking the mode and then putting the result right-side-up.

Of course not very robust/generic but perhaps it helps someone.

import random
import numpy as np
import scipy as sp

def mode_random_ties(a, **kwargs):
    r = random.choice([-1,1])
    mode, cnt = sp.stats.mode(a*r, **kwargs)
    return np.abs(mode), cnt
In [1]: mode_random_ties(a)
Out[1]: (array([[3, 3, 0]]), array([[2, 2, 3]]))
In [2]: mode_random_ties(a, axis=1)
Out[2]: 
(array([[3],
        [3],
        [5],
        [4]]),
 array([[2],
        [1],
        [1],
        [1]]))
Hans Bouwmeester
  • 1,121
  • 1
  • 17
  • 19