Your output is going to be a list (or worse, an array of objects), since your output is ragged in the general case. If you are OK with having each index corresponding to a repeated element point to the same underlying array, you can so something like the following.
The gist is to take a hint from np.unique
, which is a sort-based operation. Instead of using np.unique
, we can use np.argsort
combined with some fancy index math to get the job done.
First, you get an array with [0, 1, 4], [2, 3]
as the elements. This will be a 1D array of objects. Actually, if the split list is non-ragged, elements
will recombine into a 2D array, but it doesn't matter because it will not affect the intended interface.
idx = np.argsort(arr)
split_points = np.flatnonzero(np.diff(arr[idx])) + 1
elements = np.array(np.split(idx, split_points))
Next you can index elements
to produce the full array of objects.
inverse_index = np.argsort(idx)
counts = np.diff(np.r_[0, split_points, len(arr)])
result = np.repeat(elements, counts, axis=0)[inverse_index]
result
will be a 2D array if you have equal numbers of each unique element. You can choose to turn it into a list if you want.
Notice that the last part works because np.argsort
is its own inverse: the index that puts a sorted array into its original unsorted order is the argsort of the argsort. So we've implemented most of the features of np.unique
(inverse_index
, counts
) with intermediate results to make your specific application possible. To complete np.unique
, the forward index is np.r_[0, split_points]
, and the actual unique values are given by arr[np.r_[0, split_points]]
.
You can shorten the code from 6 lines to about 3 without recomputing any of the necessary arrays more than once. At the same time, you can say goodbye to any semblance of legibility that was there before:
idx = np.argsort(arr)
split_points = np.flatnonzero(np.diff(arr[idx])) + 1
result = np.repeat(np.array(np.split(idx, split_points)), np.diff(np.r_[0, split_points, len(arr)]), axis=0)[np.argsort(idx)]