This is my data:
a = (9,5,3)
b = (5,3,6)
c = (1,6,6)
d = (2,5,0)
e = (9,8,3)
f = (7,3,6)
g = (2,15,1)
data = [a,b,c,d,e,f,g]
I have 7 data points, In here I want to get the three data (top-k=3), it can be (a,b,c or other points) which has a maximum distance to other points/ top-k max diverse.
from scipy.spatial import distance
d = distance.euclidean(a,b)
k = 3
i = 1
distancelist = []
max_dist = []
while (i < k):
for x in (data):
for y in (data):
dist = distance.euclidean(x,y)
distancelist.append(dist)
# stuck in here
max_dist = #
i = i+1
print(max_dist)
I stuck, how to get the maximum values of distance, and poping out to the max_dist
Expected output:
[(9, 8, 3),(2, 15, 1),(5, 3, 6)] #I just choose these as random, I don't know the exact result
For example:
First subset: Total distance 18.987490074177131
# combination (a,b,c) or [(9,5,3),(5,3,6),(1,6,6)]
distance.euclidean(data[0], data[1]) + distance.euclidean(data[1], data[2]) + distance.euclidean(data[0], data[2])
Second subset: Total distance 20.000937912998413
# combination (a,b,d) or [(9,5,3),(5,3,6),(2,5,0)]
distance.euclidean(data[0], data[1]) + distance.euclidean(data[1], data[3]) + distance.euclidean(data[0], data[3])
The second subset is better than the first subset because the second has a bigger value of total distance, I want to get the subset (top-k=3) which the max distance is a maximum of all combinations.