Finding max sum distance

Question

This is my data:

a = (9,5,3)
b = (5,3,6)
c = (1,6,6)
d = (2,5,0)
e = (9,8,3)
f = (7,3,6)
g = (2,15,1)
data = [a,b,c,d,e,f,g]

I have 7 data points, In here I want to get the three data (top-k=3), it can be (a,b,c or other points) which has a maximum distance to other points/ top-k max diverse.

from scipy.spatial import distance
d = distance.euclidean(a,b)

k = 3
i = 1
distancelist = []
max_dist = []
while (i < k):
    for x in (data):
        for y in (data):
            dist = distance.euclidean(x,y)
            distancelist.append(dist)
            # stuck in here
            max_dist = #
    i = i+1
print(max_dist)

I stuck, how to get the maximum values of distance, and poping out to the max_dist

Expected output:

[(9, 8, 3),(2, 15, 1),(5, 3, 6)] #I just choose these as random, I don't know the exact result

For example:

First subset: Total distance 18.987490074177131

# combination (a,b,c) or [(9,5,3),(5,3,6),(1,6,6)]
distance.euclidean(data[0], data[1]) + distance.euclidean(data[1], data[2]) + distance.euclidean(data[0], data[2])

Second subset: Total distance 20.000937912998413

# combination (a,b,d) or [(9,5,3),(5,3,6),(2,5,0)]
distance.euclidean(data[0], data[1]) + distance.euclidean(data[1], data[3]) + distance.euclidean(data[0], data[3])

The second subset is better than the first subset because the second has a bigger value of total distance, I want to get the subset (top-k=3) which the max distance is a maximum of all combinations.

IIUC do you want to get the top 3 points with maximum distance for every point? — Bharath M Shetty, Dec 01 '17 at 05:54
Actually, my Goal is to get three points which have maximum diverse (maximum distance among those points), so top-k max diverse — user46543, Dec 01 '17 at 06:04
Without a reference to one point wouldn't that be useless to get maximum. — Bharath M Shetty, Dec 01 '17 at 06:06
In other word, I want to take subset which max diverse, but in here I just stuck in the front phase — user46543, Dec 01 '17 at 06:06
take an example where you know the answer and post it here so we better understand what is that you actually want so we better understand what you mean by max diverse in this context. — Bharath M Shetty, Dec 01 '17 at 06:45

dkato · Accepted Answer · 2017-12-01T10:16:47.277

1

How about this followings.

First, put all distance and points (x, y) into max_dixdance. Here, all pairs are generated by combinations, instead of double-for-loop.

from scipy.spatial import distance
from itertools import combinations
max_dixdance = []
# for x, y in combinations(data, 2):
#     dis = distance.euclidean(x, y)
#     max_dixdance.append((dis, (x, y)))

## modified version
for xyz in combinations(data, 3):
    # print(list(xyz)) # verify all combinations appeared

    # calculate a sum of all piarwise distance
    dis = 0
    for xy in combinations(xyz, 2):
        # print(list(xy)) # verify all pairs appeared
        dis += distance.euclidean(*xy)
    max_dixdance.append((dis, tuple(xyz)))

This code is almost (not exactly) equivalent to the followings:

## modified version - 2
for x, y, z in combinations(data, 3):
    xyz = (x, y, z)

    # calculate a sum of all piarwise distance
    dis = 0
    for x, y in combinations(xyz, 2):
        dis += distance.euclidean(x, y)
    max_dixdance.append((dis, xyz))

Then, sort the list using dis values and take top 3 elements.

max_dixdance.sort(key=lambda x: x[0], reverse=True) # from max to min
print(max_dixdance[:3])

edited Dec 01 '17 at 10:16

answered Dec 01 '17 at 06:16

dkato

895
10
28

thank you, let me try, its cool. I just know it `combinations`. Anyway what does the meaning of `key=lambda x: x[0]` – user46543 Dec 01 '17 at 06:24
Its a sorting technique to specify that the 1st element of each tuple (dis in the list) is used as a sorting key. related post -> https://stackoverflow.com/questions/10695139/sort-a-list-of-tuples-by-2nd-item-integer-value – dkato Dec 01 '17 at 06:26
This answer helps me to get the max distance between pairs, but actually I want to take top-k (3 combinations of pairs) which has max diverse/ has maximum distance. – user46543 Dec 01 '17 at 06:32
Well...I'm wondering how to measure diversity of n points. Is it like a sum of distances within a set? Could you give me an example how much diversity (4,5,6) has. – dkato Dec 01 '17 at 06:41
Thanks, I got it. – dkato Dec 01 '17 at 10:03
I updated the code. *xy in the code is a way to unpack the xy to (x, y). – dkato Dec 01 '17 at 10:11
Thank you, let me try, but again, what does the meaning of this star? `distance.euclidean(*a)` is it a pointer or something? – user46543 Dec 01 '17 at 10:31
It is no a pointer, but Python's operator to unpack a tuple. Please refer to this for more detail. https://stackoverflow.com/questions/2921847/what-does-the-star-operator-mean – dkato Dec 01 '17 at 10:45

user2390182 · Answer 2 · 2017-12-01T06:28:52.593

1

Brute force without scipy using max with key function:

from itertools import combinations

def dist2(points):  # distance of 2 points
     return sum((a_ - b_)**2 for a_, b_ in zip(*points))**0.5

def dist3(points):  # sum of triangle sides for 3 points
    return sum(map(dist2, combinations(points, 2)))

>>> max(combinations(data, 3), key=dist3)
((2, 5, 0), (7, 3, 6), (2, 15, 1))

edited Dec 01 '17 at 06:28

answered Dec 01 '17 at 06:22

user2390182

72,016
6
67
89

Thank you, but is it possible when I want k is 4,5, or 6? – user46543 Dec 01 '17 at 06:30

score 0 · Answer 3 · answered Dec 01 '17 at 06:08

This is my understanding of problem i.e get top 3 distances per point i.e

#`cdist` will give the distance from every point to one another. 
mat = scipy.spatial.distance.cdist(data,data, metric='euclidean')
#             0          1          2          3          4          5          6
#0   0.000000   5.385165   8.602325   7.615773   3.000000   4.123106  12.369317
#1   5.385165   0.000000   5.000000   7.000000   7.071068   2.000000  13.341664
#2   8.602325   5.000000   0.000000   6.164414   8.774964   6.708204  10.344080
#3   7.615773   7.000000   6.164414   0.000000   8.185353   8.062258  10.049876
#4   3.000000   7.071068   8.774964   8.185353   0.000000   6.164414  10.099505
#5   4.123106   2.000000   6.708204   8.062258   6.164414   0.000000  13.928388
#6  12.369317  13.341664  10.344080  10.049876  10.099505  13.928388   0.000000

#this is for mapping
di = dict(zip(np.arange(7),list('abcdefg')))

#Get top three distances indices using argsort
max3 = mat.argsort(1)[:,-3:]
#map the indices with the names

max3_with_names = np.array(np.vectorize(di.get)(max3)).tolist()
# [['d', 'c', 'g'],
#  ['d', 'e', 'g'],
#  ['a', 'e', 'g'],
#  ['f', 'e', 'g'],
#  ['d', 'c', 'g'],
#  ['c', 'd', 'g'],
#  ['a', 'b', 'f']]

list(zip(list('abcdefg'),max3_with_names))
# [('a', ['d', 'c', 'g']),# d,c,g is the 3 points maximum distances with respect to a.
#  ('b', ['d', 'e', 'g']),
#  ('c', ['a', 'e', 'g']),
#  ('d', ['f', 'e', 'g']),
#  ('e', ['d', 'c', 'g']),
#  ('f', ['c', 'd', 'g']),
#  ('g', ['a', 'b', 'f'])]

I got error when running this `di = dict(zip(np.arange(7),list('abcdefg')))`. it shows `TypeError: 'list' object is not callable` — user46543, Dec 01 '17 at 06:14
You might have assigned some value to list. you should not do that freee the list using `del list`, then try again. — Bharath M Shetty, Dec 01 '17 at 06:15

Finding max sum distance

3 Answers3