3

I would like to implement the following code:

a = [1, 1, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5]
sorted(a,key=a.count,reverse=True)
>>> [5, 5, 5, 5, 3, 3, 3, 4, 4, 4, 1, 1, 2]

For the case when a is a np.array

a = np.array([1, 1, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5])

How to do it? np.array has np.unique() function that calculate occurence of each element, but I don't see how can I utilize it here.

user48115
  • 445
  • 1
  • 8
  • 18

3 Answers3

2

You can use np.unique with its optional arguments return_counts and return_inverse -

u, ids, c = np.unique(a, return_counts=True, return_inverse=True)
out = a[c[ids].argsort()[::-1]]

Sample run -

In [90]: a = np.array([1, 1, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5])

In [91]: u, ids, c = np.unique(a, return_counts=True, return_inverse=1)

In [92]: a[c[ids].argsort()[::-1]]
Out[92]: array([5, 5, 5, 5, 4, 4, 4, 3, 3, 3, 1, 1, 2])
Divakar
  • 218,885
  • 19
  • 262
  • 358
1

You're looking for return_counts which you can combine with argsort + repeat. This will not guarantee the ordering of elements that appear the same number of times (notice the 4 before the 3, same count, but not "stable").


u, c = np.unique(a, return_counts=True)
i = np.argsort(c)[::-1]
np.repeat(u[i], c[i])

array([5, 5, 5, 5, 4, 4, 4, 3, 3, 3, 1, 1, 2])
user3483203
  • 50,081
  • 9
  • 65
  • 94
1

To exactly mimic sorted/list behavior @Divakar's soln can be used with a small modification:

al = [1,2,3,2,1,3,2]
aa = np.array(al)

sorted(al, key=al.count, reverse=True)
# [2, 2, 2, 1, 3, 1, 3]

u, ids, c = np.unique(aa, return_counts=True, return_inverse=True)
aa[(-c[ids]).argsort(kind="stable")]
# array([2, 2, 2, 1, 3, 1, 3])

If aa is large,

from scipy import sparse
sparse.csc_matrix((aa, (c.max()-c[ids]), np.arange(len(ids)+1))).tocsr().data
# array([2, 2, 2, 1, 3, 1, 3], dtype=int64)

may be slightly faster. Not much, though, because in both cases we first call the expensive unique, unless data are none too large integers in which case faster alternatives (to which @WarrenWeckesser appears to allude in the comments) are available including the sparse matrix trick we just used; see for example Most efficient way to sort an array into bins specified by an index array?.

aaa = np.tile(aa,10000)
timeit(lambda:aaa[(-c[ids]).argsort(kind="stable")], number=10)
# 0.040545254945755005
timeit(lambda:sparse.csc_matrix((aaa, (c.max()-c[ids]), np.arange(len(ids)+1))).tocsr().data, number=10)
# 0.0118721229955554
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99