Suppose you have:
arr = np.array([1,2,1,3,3,4])
Is there a built in function that returns the most frequent element?
Yes, Python's collections.Counter has direct support for finding the most frequent elements:
>>> from collections import Counter
>>> Counter('abracadbra').most_common(2)
[('a', 4), ('r', 2)]
>>> Counter([1,2,1,3,3,4]).most_common(2)
[(1, 2), (3, 2)]
With numpy, you might want to start with the histogram() function or the bincount() function.
With scipy, you can search for the modal element with mstats.mode.
the pandas
module might also be of help here. pandas
is a neat data analysis package for python and also has support for this problem.
import pandas as pd
arr = np.array([1,2,1,3,3,4])
arr_df = pd.Series(arr)
value_counts = arr_df.value_counts()
most_frequent = value_counts.max()
this returns
> most_frequent
2
This will work for any type, integer or not, and the return is always a numpy array:
def most_common(a, n=1) :
if a.dtype.kind not in 'bui':
items, _ = np.unique(a, return_inverse=True)
else:
items, _ = None, a
counts = np.bincount(_)
idx = np.argsort(counts)[::-1][:n]
return idx.astype(a.dtype) if items is None else items[idx]
>>> a = np.fromiter('abracadabra', dtype='S1')
>>> most_common(a, 2)
array(['a', 'r'],
dtype='|S1')
>>> a = np.random.randint(10, size=100)
>>> a
array([0, 0, 0, 9, 3, 9, 1, 2, 6, 3, 0, 4, 3, 2, 4, 7, 2, 8, 8, 2, 9, 7, 0,
3, 5, 2, 5, 0, 4, 2, 4, 7, 8, 5, 4, 0, 1, 6, 1, 0, 2, 0, 5, 1, 3, 8,
8, 6, 3, 5, 4, 3, 3, 5, 0, 7, 3, 0, 2, 5, 4, 2, 4, 2, 8, 1, 4, 4, 7,
4, 4, 3, 7, 4, 0, 1, 0, 8, 8, 1, 1, 2, 1, 4, 2, 5, 1, 0, 7, 2, 0, 0,
0, 8, 9, 9, 8, 1, 3, 8])
>>> most_common(a, 5)
array([0, 4, 2, 8, 3])