Here is a problem that I thought would have been asked about before, but I can't seem to find anything along these lines.
The problem is simple: Say I have an array of floats and I want to group it into subsets of equal values an find their indices.
For example, if I have data = [0, 1, 0, 1, 1, 2, 2, 3]
, then I would like to obtain subset_values = [0, 1, 2, 3]
as the array of unique values and subset_idxs = [[0, 2], [1, 3, 4], [5, 6], [7]]
as the array where subset_idxs[i,:]
represents the indices where the value subset_values[i]
appears in data
.
Is there an efficient pythonic solution for this task using numpy?
I have a slow and loopy solution, also using a function from https://stackoverflow.com/a/2566508/5661667:
import numpy as np
def find_nearest_idx(array, value):
"""
Find the closest element in an array and return the corresponding index.
"""
array = np.asarray(array)
idx = (np.abs(array-value)).argmin()
return idx
data = [0., 1., 0., 1., 1., 2., 2., 3.]
subset_values = []
subset_idxs = []
for i, x in enumerate(data):
if np.any(np.isclose(x, subset_values)):
subset_idxs[find_nearest_idx(subset_values, x)].append(i)
else:
for j, y in enumerate(data):
if np.isclose(x, y):
subset_values.append(x)
subset_idxs.append([i])
break
print(subset_values)
# [0.0, 1.0, 2.0, 3.0]
print(subset_idxs)
# [[0, 2], [1, 3, 4], [5, 6], [7]]
For background: I want to use this function to remove redundancies in a physics problem.