0

I am trying to use numpy.argpartition to get the n smallest values from an array. However, I cannot guarantee that there will be at least n values in the array. If there are fewer than n values, I just need the entire array.

Currently I am handling this with checking the array size, but I feel like I'm missing a native numpy method that will avoid this branching check.

if np.size(arr) < N: 
    return arr 
else:
    return arr[np.argpartition(arr, N)][:N]

Minimal reproducible example:

import numpy as np

#Find the 4 smallest values in the array
#Arrays can be arbitrarily sized, as it's the result of finding all elements in a larger array
# that meet a threshold
small_arr = np.array([3,1,4])
large_arr = np.array([3,1,4,5,0,2])

#For large_arr, I can use np.argpartition just fine:
large_idx = np.argpartition(large_arr, 4)
#large_idx now contains array([4, 5, 1, 0, 2, 3])

#small_arr results in an indexing error doing the same thing:
small_idx = np.argpartition(small_arr, 4)
#ValueError: kth(=4) out of bounds (3)

I've looked through the numpy docs for truncation, max length, and other similar terms, but nothing came up that is what I need.

blackbrandt
  • 2,010
  • 1
  • 15
  • 32
  • Could you clarify why you want to avoid the branching check? Are you looking for elegance, or do you have a measured performance concern? – Brian61354270 Jan 24 '23 at 17:38
  • 1
    Partially elegance, partially me trying to learn numpy better. If there isn't a built-in or native numpy solution, I'll just do the branching check. – blackbrandt Jan 24 '23 at 17:52

2 Answers2

1

One way (depending on your situation) is just to cap the arg to argparse when the array is shorter with min:

return arr[np.argpartition(arr, min(N, arr.size - 1)][:N]

Slicing tolerates higher values than the array length, it's just argpartition that needs the min() check.

This is less efficient than your branch version, since it has to do the argpartition even when you just want the whole array, but it's more concise. So it depends what your priorities are - personally I'd probably keep the branch or use a ternary:

return arr if arr.size < N else arr[np.argpartition(arr, N)][:N]
Peter DeGlopper
  • 36,326
  • 7
  • 90
  • 83
  • That's what I was starting to guess. Just wanted to make sure I wasn't committing a numpy crime and missing some built in method. – blackbrandt Jan 24 '23 at 17:55
  • 1
    @blackbrandt I could still be missing something, my own numpy knowledge is spotty. Doesn't look like it though. – Peter DeGlopper Jan 24 '23 at 18:03
0

You can try:

def nsmallest(array, n):
    return array[np.argsort(array)[:n]]

where n is the number of smallest values that you want.

Matt Pitkin
  • 3,989
  • 1
  • 18
  • 32
  • It's worth noting that this has a [greater time complexity](https://stackoverflow.com/q/6910641/11082165) than the OP's existing solution. – Brian61354270 Jan 24 '23 at 17:37
  • @Brian thanks for pointing that out, I already looked at using argsort and realized it sorts the entire array which would be a waste of time. – blackbrandt Jan 24 '23 at 17:53