2

This is my data:

a = (9,5,3)
b = (5,3,6)
c = (1,6,6)
d = (2,5,0)
e = (9,8,3)
f = (7,3,6)
g = (2,15,1)
data = [a,b,c,d,e,f,g]

I have 7 data points, In here I want to get the three data (top-k=3), it can be (a,b,c or other points) which has a maximum distance to other points/ top-k max diverse.

from scipy.spatial import distance
d = distance.euclidean(a,b)

k = 3
i = 1
distancelist = []
max_dist = []
while (i < k):
    for x in (data):
        for y in (data):
            dist = distance.euclidean(x,y)
            distancelist.append(dist)
            # stuck in here
            max_dist = #
    i = i+1
print(max_dist)

I stuck, how to get the maximum values of distance, and poping out to the max_dist

Expected output:

[(9, 8, 3),(2, 15, 1),(5, 3, 6)] #I just choose these as random, I don't know the exact result

For example:

First subset: Total distance 18.987490074177131

# combination (a,b,c) or [(9,5,3),(5,3,6),(1,6,6)]
distance.euclidean(data[0], data[1]) + distance.euclidean(data[1], data[2]) + distance.euclidean(data[0], data[2])

Second subset: Total distance 20.000937912998413

# combination (a,b,d) or [(9,5,3),(5,3,6),(2,5,0)]
distance.euclidean(data[0], data[1]) + distance.euclidean(data[1], data[3]) + distance.euclidean(data[0], data[3])

The second subset is better than the first subset because the second has a bigger value of total distance, I want to get the subset (top-k=3) which the max distance is a maximum of all combinations.

user46543
  • 1,033
  • 4
  • 13
  • 23

3 Answers3

1

How about this followings.

First, put all distance and points (x, y) into max_dixdance. Here, all pairs are generated by combinations, instead of double-for-loop.

from scipy.spatial import distance
from itertools import combinations
max_dixdance = []
# for x, y in combinations(data, 2):
#     dis = distance.euclidean(x, y)
#     max_dixdance.append((dis, (x, y)))

## modified version
for xyz in combinations(data, 3):
    # print(list(xyz)) # verify all combinations appeared

    # calculate a sum of all piarwise distance
    dis = 0
    for xy in combinations(xyz, 2):
        # print(list(xy)) # verify all pairs appeared
        dis += distance.euclidean(*xy)
    max_dixdance.append((dis, tuple(xyz)))

This code is almost (not exactly) equivalent to the followings:

## modified version - 2
for x, y, z in combinations(data, 3):
    xyz = (x, y, z)

    # calculate a sum of all piarwise distance
    dis = 0
    for x, y in combinations(xyz, 2):
        dis += distance.euclidean(x, y)
    max_dixdance.append((dis, xyz))

Then, sort the list using dis values and take top 3 elements.

max_dixdance.sort(key=lambda x: x[0], reverse=True) # from max to min
print(max_dixdance[:3])
dkato
  • 895
  • 10
  • 28
  • thank you, let me try, its cool. I just know it `combinations`. Anyway what does the meaning of `key=lambda x: x[0]` – user46543 Dec 01 '17 at 06:24
  • Its a sorting technique to specify that the 1st element of each tuple (dis in the list) is used as a sorting key. related post -> https://stackoverflow.com/questions/10695139/sort-a-list-of-tuples-by-2nd-item-integer-value – dkato Dec 01 '17 at 06:26
  • This answer helps me to get the max distance between pairs, but actually I want to take top-k (3 combinations of pairs) which has max diverse/ has maximum distance. – user46543 Dec 01 '17 at 06:32
  • Well...I'm wondering how to measure diversity of n points. Is it like a sum of distances within a set? Could you give me an example how much diversity (4,5,6) has. – dkato Dec 01 '17 at 06:41
  • Thanks, I got it. – dkato Dec 01 '17 at 10:03
  • I updated the code. *xy in the code is a way to unpack the xy to (x, y). – dkato Dec 01 '17 at 10:11
  • Thank you, let me try, but again, what does the meaning of this star? `distance.euclidean(*a)` is it a pointer or something? – user46543 Dec 01 '17 at 10:31
  • It is no a pointer, but Python's operator to unpack a tuple. Please refer to this for more detail. https://stackoverflow.com/questions/2921847/what-does-the-star-operator-mean – dkato Dec 01 '17 at 10:45
1

Brute force without scipy using max with key function:

from itertools import combinations

def dist2(points):  # distance of 2 points
     return sum((a_ - b_)**2 for a_, b_ in zip(*points))**0.5

def dist3(points):  # sum of triangle sides for 3 points
    return sum(map(dist2, combinations(points, 2)))

>>> max(combinations(data, 3), key=dist3)
((2, 5, 0), (7, 3, 6), (2, 15, 1))
user2390182
  • 72,016
  • 6
  • 67
  • 89
0

This is my understanding of problem i.e get top 3 distances per point i.e

#`cdist` will give the distance from every point to one another. 
mat = scipy.spatial.distance.cdist(data,data, metric='euclidean')
#             0          1          2          3          4          5          6
#0   0.000000   5.385165   8.602325   7.615773   3.000000   4.123106  12.369317
#1   5.385165   0.000000   5.000000   7.000000   7.071068   2.000000  13.341664
#2   8.602325   5.000000   0.000000   6.164414   8.774964   6.708204  10.344080
#3   7.615773   7.000000   6.164414   0.000000   8.185353   8.062258  10.049876
#4   3.000000   7.071068   8.774964   8.185353   0.000000   6.164414  10.099505
#5   4.123106   2.000000   6.708204   8.062258   6.164414   0.000000  13.928388
#6  12.369317  13.341664  10.344080  10.049876  10.099505  13.928388   0.000000

#this is for mapping
di = dict(zip(np.arange(7),list('abcdefg')))

#Get top three distances indices using argsort
max3 = mat.argsort(1)[:,-3:]
#map the indices with the names

max3_with_names = np.array(np.vectorize(di.get)(max3)).tolist()
# [['d', 'c', 'g'],
#  ['d', 'e', 'g'],
#  ['a', 'e', 'g'],
#  ['f', 'e', 'g'],
#  ['d', 'c', 'g'],
#  ['c', 'd', 'g'],
#  ['a', 'b', 'f']]

list(zip(list('abcdefg'),max3_with_names))
# [('a', ['d', 'c', 'g']),# d,c,g is the 3 points maximum distances with respect to a.
#  ('b', ['d', 'e', 'g']),
#  ('c', ['a', 'e', 'g']),
#  ('d', ['f', 'e', 'g']),
#  ('e', ['d', 'c', 'g']),
#  ('f', ['c', 'd', 'g']),
#  ('g', ['a', 'b', 'f'])]
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
  • I got error when running this `di = dict(zip(np.arange(7),list('abcdefg')))`. it shows `TypeError: 'list' object is not callable` – user46543 Dec 01 '17 at 06:14
  • You might have assigned some value to list. you should not do that freee the list using `del list`, then try again. – Bharath M Shetty Dec 01 '17 at 06:15