One approach loop through the array and make a note of the best values you've seen, then reconstruct the array at the end:
import numpy as np
def rows_by_unique_y(arr):
best_for_y = defaultdict(lambda: float('inf'))
for i, row in enumerate(arr):
x,y = row[0], row[1]
best_for_y[y] = min(x, best_for_y[y])
return np.array([[x,y] for y, x in best_for_y.items()])
arr = np.array([[6, 5], [6, 9], [7, 5], [7, 9], [8, 10], [9, 10], [9, 11], [10, 10]])
print(rows_by_unique_y(arr))
No need to sort, just keep track of the minimums. This outputs:
[[ 6 5]
[ 6 9]
[ 8 10]
[ 9 11]]
While this answer is asymptotically faster, user3483203's answer is much better in practice. This is because it calls out to optimized C code rather than staying inside of Python's surprisingly slow interpreter. However, if your arrays are huge (several gigabytes) then the O(n log n) behavior will start to lose to this.
At the same time, if your arrays are that large, you should probably be using a MapReduce framework like Spark instead. The algorithm I gave above is easily parallelized.
If you don't need the minimum x
values, then the following one-liner using np.unique
works:
arr[np.unique(arr[:,1], return_index=True)[1]]
but this returns
array([[ 6, 5],
[ 6, 9],
[10, 10],
[ 9, 11]])
if you switch the 8
and the 10
.