0

Using np.argpartition, it does not sort the entire array. It only guarantees that the kth element is in sorted position and all smaller elements will be moved before it. Thus, the first k elements will be the k-smallest elements

>>> num = 3
>>> myBigArray=np.array([[1,3,2,5,7,0],[14,15,6,5,7,0],[17,8,9,5,7,0]])
>>> top = np.argpartition(myBigArray, num, axis=1)[:, :num]
>>> print top
[[5 0 2]
[3 5 2]
[5 3 4]]
>>> myBigArray[np.arange(myBigArray.shape[0])[:, None], top]
[[0 1 2]
[5 0 6]
[0 5 7]]

This returns the k-smallest values of each column. Note that these may not be in sorted order.I use this method because To get the top-k elements in sorted order in this way takes O(n + k log k) time I want to get the k-smallest values of each column in sorted order, without increasing the time complexity. Any suggestions??

kmario23
  • 57,311
  • 13
  • 161
  • 150
Raja Hammad Farooq
  • 921
  • 1
  • 9
  • 17
  • Is your question how to go from the smallest *k* to the largest (top?) *k*? In your example, you seem to already handle the case of the smallest ones. Or is the question how to go from rows to columns? – fuglede Mar 18 '17 at 13:41
  • Nope, I want to get the k-smallest values of each column in sorted order. – Raja Hammad Farooq Mar 18 '17 at 14:16

3 Answers3

3

To use np.argpartition and maintain the sorted order, we need to use those range of elements as range(k) instead of feeding in just the scalar kth param -

idx = np.argpartition(myBigArray, range(num), axis=1)[:, :num]
out = myBigArray[np.arange(idx.shape[0])[:,None], idx]
Divakar
  • 218,885
  • 19
  • 262
  • 358
2

You can use the exact same trick that you used in the case of rows; combining with @Divakar's trick for sorting, this becomes

In [42]: num = 2

In [43]: myBigArray[np.argpartition(myBigArray, range(num), axis=0)[:num, :], np.arange(myBigArray.shape[1])[None, :]]
Out[43]: 
array([[ 1,  3,  2,  5,  7,  0],
       [14,  8,  6,  5,  7,  0]])
fuglede
  • 17,388
  • 2
  • 54
  • 99
0

A bit of indirect indexing does the trick. Pleaese note that I worked on rows since you started off on rows.

fdim = np.arange(3)[:, None]
so = np.argsort(myBigArray[fdim, top], axis=-1)
tops = top[fdim, so]
myBigArray[fdim, tops]
# array([[0, 1, 2],
         [0, 5, 6],
         [0, 5, 7]])

A note on argpartition with range argument: I strongly suspect that it is not O(n + k log k); in any case it is typically several-fold slower than a manual argpartition + argsort see here

Community
  • 1
  • 1
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99