Geting the k-smallest values of each column in sorted order using Numpy.argpartition

Question

Using np.argpartition, it does not sort the entire array. It only guarantees that the kth element is in sorted position and all smaller elements will be moved before it. Thus, the first k elements will be the k-smallest elements

>>> num = 3
>>> myBigArray=np.array([[1,3,2,5,7,0],[14,15,6,5,7,0],[17,8,9,5,7,0]])
>>> top = np.argpartition(myBigArray, num, axis=1)[:, :num]
>>> print top
[[5 0 2]
[3 5 2]
[5 3 4]]
>>> myBigArray[np.arange(myBigArray.shape[0])[:, None], top]
[[0 1 2]
[5 0 6]
[0 5 7]]

This returns the k-smallest values of each column. Note that these may not be in sorted order.I use this method because To get the top-k elements in sorted order in this way takes O(n + k log k) time I want to get the k-smallest values of each column in sorted order, without increasing the time complexity. Any suggestions??

Is your question how to go from the smallest *k* to the largest (top?) *k*? In your example, you seem to already handle the case of the smallest ones. Or is the question how to go from rows to columns? — fuglede, Mar 18 '17 at 13:41
Nope, I want to get the k-smallest values of each column in sorted order. — Raja Hammad Farooq, Mar 18 '17 at 14:16

score 3 · Accepted Answer · answered Mar 18 '17 at 14:33

To use np.argpartition and maintain the sorted order, we need to use those range of elements as range(k) instead of feeding in just the scalar kth param -

idx = np.argpartition(myBigArray, range(num), axis=1)[:, :num]
out = myBigArray[np.arange(idx.shape[0])[:,None], idx]

score 2 · Answer 2 · answered Mar 18 '17 at 14:43

You can use the exact same trick that you used in the case of rows; combining with @Divakar's trick for sorting, this becomes

In [42]: num = 2

In [43]: myBigArray[np.argpartition(myBigArray, range(num), axis=0)[:num, :], np.arange(myBigArray.shape[1])[None, :]]
Out[43]: 
array([[ 1,  3,  2,  5,  7,  0],
       [14,  8,  6,  5,  7,  0]])

score 0 · Answer 3 · edited May 23 '17 at 12:09

0

A bit of indirect indexing does the trick. Pleaese note that I worked on rows since you started off on rows.

fdim = np.arange(3)[:, None]
so = np.argsort(myBigArray[fdim, top], axis=-1)
tops = top[fdim, so]
myBigArray[fdim, tops]
# array([[0, 1, 2],
         [0, 5, 6],
         [0, 5, 7]])

A note on argpartition with range argument: I strongly suspect that it is not O(n + k log k); in any case it is typically several-fold slower than a manual argpartition + argsort see here

edited May 23 '17 at 12:09

Community

1
1

answered Mar 18 '17 at 13:47

Paul Panzer

51,835
3
54
99

Any solution using np.argpartition ?? – Raja Hammad Farooq Mar 18 '17 at 14:19
@RajaHammadFarooq It uses `argpartition`; it starts after your third line. I mean it is meant to include you first three lines. – Paul Panzer Mar 18 '17 at 14:47

Geting the k-smallest values of each column in sorted order using Numpy.argpartition

3 Answers3

Linked