I need to find duplicates in a 2d numpy array. As a result i want a list of the same length as the input which points to the first occurrence of the corresponding value. For example the array [[1, 0, 0], [1, 0, 0], [2, 3, 4]] has two equal elements 0 and 1. The method should return [0, 0, 2] (see examples in code below). The following code is working but slow for large arrays.
import numpy as np
def duplicates(ar):
"""
Args:
ar (array_like): array
Returns:
list of int: int is pointing to first occurence of unique value
"""
# duplicates array:
dup = np.full(ar.shape[0], -1, dtype=int)
for i in range(ar.shape[0]):
if dup[i] != -1:
# i is already found to be a
continue
else:
dup[i] = i
for j in range(i + 1, ar.shape[0]):
if (ar[i] == ar[j]).all():
dup[j] = i
return dup
if __name__ == '__main__':
n = 100
# shortest extreme for n points
a1 = np.array([[0, 1, 2]] * n)
assert (duplicates(a1) == np.full(n, 0)).all(), True
# longest extreme for n points
a2 = np.linspace(0, 1, n * 3).reshape((n, 3))
assert (duplicates(a2) == np.arange(0, n)).all(), True
# test case
a3 = np.array([[1, 0, 0], [1, 0, 0], [2, 3, 4]])
assert (duplicates(a3) == [0, 0, 2]).all(), True
Any idea how to speedup the process (e.g. avoid the second for loop) or alternative implementations? Cheers