I have an array of a minimum of 10s of thousands of points (up to 3 billion) some of which are duplicated. I'd like to deduplicate the points and generate an index array which retains the original sequence of the duplicated points.
For example:
x = [(0, 0), # (x1, y1)
(1, 0), # (x2, y2)
(1, 1), # (x3, y3)
(0, 0)] # (x4, y4)
Deduplicating x, we have y:
y = list(set(x)) = [(1, 0), # (x2, y2)
(0, 0), # (x1, y1) and (x4, y4)
(1, 1)] # (x3, y3)
And then we would have a resulting index array, z:
z = [1, # (x1, y1)
0, # (x2, y2)
2, # (x3, y3)
1] # (x4, y4)
Is there a numpy-like way of obtaining z? Here's a brute-force implementation:
z = []
for each_point in x:
index = y.index(each_point)
z.append(index)