How to find most common elemtent in a ndarray

Question

I have a numpy array with the following shape (11617, 37). The data is multi class data, and to establish a baseline, I need to find which class (or classes) are the most common.

I have tried this formula and also this

A = np.array([[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0],
     [0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0],
     [0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]])

axis = 0
u, indices = np.unique(arr, return_inverse=True)
answer = u[np.argmax(np.apply_along_axis(np.bincount, axis,                                              indices.reshape(arr.shape),
                                None, np.max(indices) + 1), axis=axis)]

I need to find the most frequent combination of the 37 classes in my array

Expected output:

[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0]

What were your finding when trying those two references? – gosuto Dec 27 '18 at 12:33 — gosuto, Dec 27 '18 at 12:33
Can you sample output as well? – J...S Dec 27 '18 at 12:36 — J...S, Dec 27 '18 at 12:36
one returned only zeroes, the other only ones – Bok Dec 27 '18 at 12:38 — Bok, Dec 27 '18 at 12:38

Venkatachalam · Accepted Answer · 2019-05-28T09:26:05.400

To find the most frequent combination (rows, which means axis=0), you can try this!

A = np.array([[1,0,0,0],
             [1,0,0,1],
             [1,0,0,0]])

unique_rows,counts = np.unique(A, return_counts=True,axis=0)
unique_rows[np.argmax(counts)]

FYI, If the array you mentioned in the question is your target variable, then it is an example of multi-label data.

This may be of use for you to understand multi-class and multi-label

score 1 · Answer 2 · answered Dec 27 '18 at 12:42

You could try np.unique with return_counts parameter:

from operator import itemgetter

import numpy as np

A = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

uniques, counts = np.unique(A, axis=0, return_counts=True)

idxmax, _ = max(zip(range(len(counts)), counts), key=itemgetter(1))
print(uniques[idxmax])

Output

[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0]

score 1 · Answer 3 · answered Dec 27 '18 at 12:42

You can use collections.Counter.most_common if you convert your list of list elements to a tuple (convert the lists to tuples so they can be counted)

from collections import Counter

A = [[0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0]]

c = Counter(tuple(x) for x in A)
print(c.most_common()[0]) # ((0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0), 2)

This returns a tuple containing the most common list and the number of occurrences.

score 1 · Answer 4 · answered Dec 27 '18 at 12:43

A really quick and easy solution:

A = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

print(max(A, key=A.count))

Which prints:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]

If you need to pay attention to runtime or want to optimize your code - this is not the way you want to go. However, if you just need a quick solution, it might help to keep this one-liner in mind.

(A.tolist() gets you a list from a np.ndarray if you need that first.)

score 0 · Answer 5 · answered Dec 27 '18 at 12:48

from collections import Counter
A = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
most_common = [Counter(i).most_common(1).pop()[0] for i in A]
most_common
   [0, 0, 0]

How to find most common elemtent in a ndarray

5 Answers5