Is there a way to get the top k values per row of a numpy array (Python)?

Question

Given a numpy array of the form below:

x = [[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]]

is there a way to retain the top-3 values in each row and set others to zero in python (without an explicit loop). The result in the case of the example above would be

x = [[4.,3.,0.,0.,8.],[0.,3.1,0.,9.2,5.5],[0.0,7.0,4.4,0.0,1.3]]

Code for one example

import numpy as np
arr = np.array([1.2,3.1,0.,9.2,5.5,3.2])
indexes=arr.argsort()[-3:][::-1]
a = list(range(6))
A=set(indexes); B=set(a)
zero_ind=(B.difference(A)) 
arr[list(zero_ind)]=0

The output:

array([0. , 0. , 0. , 9.2, 5.5, 3.2])

Above is my sample code (with many lines) for a 1-D numpy array. Looping through each row of a numpy array and performing this same computation repeatedly would be quite expensive. Is there a simpler way?

Does the following help? https://stackoverflow.com/questions/13070461/get-index-of-the-top-n-values-of-a-list-in-python — Kevin Liu, Dec 19 '19 at 03:08

score 4 · Answer 1 · answered Dec 19 '19 at 07:28

Here is a fully vectorized code without third party outside numpy. It is using numpy's argpartition to efficiently find the k-th values. See for instance this answer for other use cases.

def truncate_top_k(x, k, inplace=False):
    m, n = x.shape
    # get (unsorted) indices of top-k values
    topk_indices = numpy.argpartition(x, -k, axis=1)[:, -k:]
    # get k-th value
    rows, _ = numpy.indices((m, k))
    kth_vals = x[rows, topk_indices].min(axis=1)
    # get boolean mask of values smaller than k-th
    is_smaller_than_kth = x < kth_vals[:, None]
    # replace mask by 0
    if not inplace:
        return numpy.where(is_smaller_than_kth, 0, x)
    x[is_smaller_than_kth] = 0
    return x

score 1 · Answer 2 · answered Dec 19 '19 at 03:32

Use np.apply_along_axis to apply a function to 1-D slices along a given axis

import numpy as np

def top_k_values(array):
    indexes = array.argsort()[-3:][::-1]
    A = set(indexes)
    B = set(list(range(array.shape[0])))
    array[list(B.difference(A))]=0
    return array

arr = np.array([[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]])
result = np.apply_along_axis(top_k_values, 1, arr)
print(result)

Output

[[4.  3.  0.  0.  8. ]
 [0.  3.1 0.  9.2 5.5]
 [0.  7.  4.4 0.  1.3]]

Daniel F · Answer 3 · 2019-12-19T08:49:47.020

def top_k(arr, k, axis = 0):
    top_k_idx =  = np.take_along_axis(np.argpartition(arr, -k, axis = axis), 
                                      np.arange(-k,-1), 
                                      axis = axis)  # indices of top k values in axis
    out = np.zeros.like(arr)                        # create zero array
    np.put_along_axis(out, top_k_idx,               # put idx values of arr in out
                      np.take_along_axis(arr, top_k_idx, axis = axis), 
                      axis = axis)
    return out

This should work for arbitrary axis and k, but does not work in-place. If you want in-place it's a bit simpler:

def top_k(arr, k, axis = 0):
    remove_idx =  = np.take_along_axis(np.argpartition(arr, -k, axis = axis), 
                                           np.arange(arr.shape[axis] - k), 
                                           axis = axis)    # indices to remove
    np.put_along_axis(out, remove_idx, 0, axis = axis)     # put 0 in indices

score 0 · Answer 4 · answered Dec 19 '19 at 03:15

Here is an alternative that use a list comprehension to look thru your array and applying the keep_top_3 function

import numpy as np
import heapq

def keep_top_3(arr): 
    smallest = heapq.nlargest(3, arr)[-1]  # find the top 3 and use the smallest as cut off
    arr[arr < smallest] = 0 # replace anything lower than the cut off with 0
    return arr 

x = [[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]]
result = [keep_top_3(np.array(arr)) for arr  in x]

I hope this helps :)

Is there a way to get the top k values per row of a numpy array (Python)?

4 Answers4

Linked