0

Here is a problem that I thought would have been asked about before, but I can't seem to find anything along these lines.

The problem is simple: Say I have an array of floats and I want to group it into subsets of equal values an find their indices.

For example, if I have data = [0, 1, 0, 1, 1, 2, 2, 3], then I would like to obtain subset_values = [0, 1, 2, 3] as the array of unique values and subset_idxs = [[0, 2], [1, 3, 4], [5, 6], [7]] as the array where subset_idxs[i,:] represents the indices where the value subset_values[i] appears in data.

Is there an efficient pythonic solution for this task using numpy?


I have a slow and loopy solution, also using a function from https://stackoverflow.com/a/2566508/5661667:

import numpy as np

def find_nearest_idx(array, value):
    """
    Find the closest element in an array and return the corresponding index.
    """
    array = np.asarray(array)
    idx = (np.abs(array-value)).argmin()
    return idx

data = [0., 1., 0., 1., 1., 2., 2., 3.]

subset_values = []
subset_idxs = []
for i, x in enumerate(data):
    if np.any(np.isclose(x, subset_values)):
        subset_idxs[find_nearest_idx(subset_values, x)].append(i)
    else:
        for j, y in enumerate(data):
            if np.isclose(x, y):
                subset_values.append(x)
                subset_idxs.append([i])
                break
print(subset_values)
# [0.0, 1.0, 2.0, 3.0]
print(subset_idxs)
# [[0, 2], [1, 3, 4], [5, 6], [7]]

For background: I want to use this function to remove redundancies in a physics problem.

yatu
  • 86,083
  • 12
  • 84
  • 139
Wolpertinger
  • 1,169
  • 2
  • 13
  • 30

0 Answers0