0

Given a large numpy matrix/multi-dimensional array, what is the best and fastest way to get the indices of the x smallest elements?

Eric Hansen
  • 336
  • 2
  • 12
  • 2
    Does this answer your question? [Find the index of the k smallest values of a numpy array](https://stackoverflow.com/questions/34226400/find-the-index-of-the-k-smallest-values-of-a-numpy-array) – sascha Sep 09 '22 at 18:50
  • No, that question is about a 1-dimensional array. This question is about a multi-dimensional array. – Eric Hansen Sep 09 '22 at 18:52

1 Answers1

1
from typing import Tuple

import numpy as np


def get_indices_of_k_smallest_as_array(arr: np.ndarray, k: int) -> np.ndarray:
    idx = np.argpartition(arr.ravel(), k)
    return np.array(np.unravel_index(idx, arr.shape))[:, range(k)].transpose().tolist()


def get_indices_of_k_smallest_as_tuple(arr: np.ndarray, k: int) -> Tuple:
    idx = np.argpartition(arr.ravel(), k)
    return tuple(np.array(np.unravel_index(idx, arr.shape))[:, range(min(k, 0), max(k, 0))])

This answer gives the correct indices, but those indices aren't sorted based on size of the elements. That's just how the introselect algorithm works, which is used by np.argpartition under the hood, https://en.wikipedia.org/wiki/Introselect.

It would be nice if the return was also sorted based on the size of the elements, ex. index 0 of the return points to the smallest element, index 1 points to the 2nd smallest element, etc.

Here's how to do it with sorting. Keep in mind that sorting the results after np.argpartition is going to be much faster than sorting the entire multi-dimensional array.

def get_indices_of_k_smallest_as_array(arr: np.ndarray, k: int) -> np.ndarray:
    ravel_array = arr.ravel()
    indices_on_ravel = np.argpartition(ravel_array, k)
    sorted_indices_on_ravel = sorted(indices_on_ravel, key=lambda x: ravel_array[x])
    sorted_indices_on_original = np.array(np.unravel_index(sorted_indices_on_ravel, arr.shape))[:, range(k)].transpose().tolist()
    # for the fun of numpy indexing, you can do it this way too
    # indices_on_original = np.array(np.unravel_index(indices_on_ravel, arr.shape))[:, range(k)].transpose().tolist()
    # sorted_indices_on_original = sorted(indices_on_original, key=lambda x: arr[tuple(np.array(x).T)])
    return sorted_indices_on_original


def get_indices_of_k_smallest_as_tuple(arr: np.ndarray, k: int) -> Tuple:
    ravel_array = arr.ravel()
    indices_on_ravel = np.argpartition(ravel_array, k)
    sorted_indices_on_ravel = sorted(indices_on_ravel, key=lambda x: ravel_array[x])
    sorted_indices_on_original = tuple(
        np.array(np.unravel_index(sorted_indices_on_ravel, arr.shape))[:, range(min(k, 0), max(k, 0))]
    )
    return sorted_indices_on_original
Eric Hansen
  • 336
  • 2
  • 12