I'd like to get the indexes of unique vectors using hash (for matrices it is efficient) but np.intersect1d does not give indices, it gives values. np.in1d on the other hand does give indices but not unique ones. I zipped a dict to make it work but it doesn't seem like the most efficient. I am new to python so trying to see if there is a better way to do this. Thanks for the help!
code:
import numpy as np
import hashlib
x=np.array([[1, 2, 3],[1, 2, 3], [4, 5, 6], [7, 8, 9]])
y=np.array([[4, 5, 6], [7, 8, 9],[1, 2, 3]])
xhash=[hashlib.sha1(row).digest() for row in x]
yhash=[hashlib.sha1(row).digest() for row in y]
z=np.intersect1d(xhash,yhash)
idx=list(range(len(xhash)))
d=dict(zip(xhash,idx))
unique_idx=[d[i] for i in z] #is there a better way to get this or boolean array
print(unique_idx)
uniques=np.array([x[i] for i in unique_idx])
print(uniques)
output:
>>> [2, 3, 1]
[[4 5 6]
[7 8 9]
[1 2 3]]
I'm having a similar issue for np.unique() where it doesn't give me any indexes.