I'm trying to choose only unique rows in numpy.ndarray (variable named cluster). When I define this variable explicitely like here:
cluster=np.array([[0.157,-0.4778],[0.157,-0.4778],[0.157,-0.4778],[-0.06156924,-0.21786049],[-0.06156924,-0.21786049],[0.02,-0.35]])
it works as it should:
[[ 0.157 -0.4778 ]
[-0.06156924 -0.21786049]
[ 0.02 -0.35 ]]
But unfortunately this variable cluster is a part of a bigger array (xtrans). So it can be defined only through array slicing:
splitted_clusters=[0,1,4,5,10]
cluster=xtrans[splitted_clusters]
The functions are the same, the data types are the same.
BUT!!! in latter case it works quite weird: it may add identical rows or it may not add them. As a result I have something like this:
[[ 0.157 -0.4778 ]
[ 0.157 -0.4778 ]
[-0.06156924 -0.21786049]
[ 0.02 -0.35 ]]
In my real example with an 44*2 array it adds 22 identical rows and it misses 23 of them (the scheme is quite strange too: it adds rows with indices 0,1,2,4,9,11,12,18 etc). But the number of added identical rows differs. AND it is supposed to add only ONE (the first) row of these 44 rows.
As for method of choosing unique rows firstly I used one from this thread Find unique rows in numpy.array
b =np.ascontiguousarray(cluster).view(np.dtype((np.void, cluster.dtype.itemsize * cluster.shape[1])))
_, idx = np.unique(b, return_index=True)
unique_cl = cluster[idx]
Then I've tried my code to check:
unique_cl=np.array([0,0])
for i in range(cluster.shape[0]):
if i==0:
unique_cl=np.vstack([cluster[i,:]])
elif cluster[i,:].tolist() not in unique_cl.tolist():
unique_cl=np.vstack([unique_cl,cluster[i,:]])
The results are the same and I really have no idea why. I would be very grateful for any help/advice/suggestion/idea.
The problem was in floats. When I rounded values of array to 7 decimal places everything works as should. Thank Eelco Hoogendoorn for this idea.