How to sort unique np array elements by occurence?

Question

I would like to implement the following code:

a = [1, 1, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5]
sorted(a,key=a.count,reverse=True)
>>> [5, 5, 5, 5, 3, 3, 3, 4, 4, 4, 1, 1, 2]

For the case when a is a np.array

a = np.array([1, 1, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5])

How to do it? np.array has np.unique() function that calculate occurence of each element, but I don't see how can I utilize it here.

In your example, both 3 and 4 occur three times. Is it important that the 3s occurs before the 4s in the output? I.e. are there special constraints that must be met in the case of ties? — Warren Weckesser, Jun 12 '19 at 04:32
Does your actual data contain just (relatively) small integers as in your example? — Warren Weckesser, Jun 12 '19 at 04:33

score 2 · Answer 1 · answered Jun 12 '19 at 03:35

You can use np.unique with its optional arguments return_counts and return_inverse -

u, ids, c = np.unique(a, return_counts=True, return_inverse=True)
out = a[c[ids].argsort()[::-1]]

Sample run -

In [90]: a = np.array([1, 1, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5])

In [91]: u, ids, c = np.unique(a, return_counts=True, return_inverse=1)

In [92]: a[c[ids].argsort()[::-1]]
Out[92]: array([5, 5, 5, 5, 4, 4, 4, 3, 3, 3, 1, 1, 2])

user3483203 · Answer 2 · 2019-06-12T03:22:27.793

1

You're looking for return_counts which you can combine with argsort + repeat. This will not guarantee the ordering of elements that appear the same number of times (notice the 4 before the 3, same count, but not "stable").

u, c = np.unique(a, return_counts=True)
i = np.argsort(c)[::-1]
np.repeat(u[i], c[i])

array([5, 5, 5, 5, 4, 4, 4, 3, 3, 3, 1, 1, 2])

edited Jun 12 '19 at 03:22

answered Jun 12 '19 at 03:17

user3483203

50,081
9
65
94

score 1 · Accepted Answer · answered Jun 12 '19 at 05:08

To exactly mimic sorted/list behavior @Divakar's soln can be used with a small modification:

al = [1,2,3,2,1,3,2]
aa = np.array(al)

sorted(al, key=al.count, reverse=True)
# [2, 2, 2, 1, 3, 1, 3]

u, ids, c = np.unique(aa, return_counts=True, return_inverse=True)
aa[(-c[ids]).argsort(kind="stable")]
# array([2, 2, 2, 1, 3, 1, 3])

If aa is large,

from scipy import sparse
sparse.csc_matrix((aa, (c.max()-c[ids]), np.arange(len(ids)+1))).tocsr().data
# array([2, 2, 2, 1, 3, 1, 3], dtype=int64)

may be slightly faster. Not much, though, because in both cases we first call the expensive unique, unless data are none too large integers in which case faster alternatives (to which @WarrenWeckesser appears to allude in the comments) are available including the sparse matrix trick we just used; see for example Most efficient way to sort an array into bins specified by an index array?.

aaa = np.tile(aa,10000)
timeit(lambda:aaa[(-c[ids]).argsort(kind="stable")], number=10)
# 0.040545254945755005
timeit(lambda:sparse.csc_matrix((aaa, (c.max()-c[ids]), np.arange(len(ids)+1))).tocsr().data, number=10)
# 0.0118721229955554

How to sort unique np array elements by occurence?

3 Answers3