Finding the two clusters with smallest distance between a list of clusters

Question

I have a list of solution clusters in the following format:

Input:

test = []

# solutions, centroid
test.append([[3,5],4])
test.append([[2,8],5])
test.append([[1,3],2])
test.append([[5,9],7])

Out: [[[3, 5], 4], [[2, 8], 5], [[1, 3], 2], [[5, 9], 7]]

How would I return the union of the 2 clusters with the smallest centroid distance out of all clusters?

Goal Output: [[2,8,3,5],4.5]

The order of the solutions in the union cluster is not relevant.

I have spent a considerable amount of time trying to come up with a solution with multiple loops to no avail.

@I'mahdi either is fine for me. Performance is also not an issue as of now, I need it for an algorithm I'm trying to implement. — Charles David Mupende, Jun 27 '22 at 15:25
Can you clarify what you mean by "smallest centroid distance out of all clusters"? Do you mean the distance between the centroids of the two clusters that were merged should be the smallest of all the possible pairs of centroids? (Have you seen [this](/q/43650931/)?) Also if you have spent a considerable amount of time trying to come up with a solution, you probably have a more focused question that simply "How would I do this?", and you should try to create a [mre] and [ask that specific question](//meta.stackoverflow.com/q/260648/843953). — Pranav Hosangadi, Jun 27 '22 at 15:29

score 1 · Accepted Answer · answered Jun 27 '22 at 15:32

1

You can do with two for-loop:

res = []
for t1 in test:
    for t2 in test:
        if t1 != t2:
            res.append([t1[0]+t2[0], (t1[1]+t2[1])/2])
  
# Or as one-line          
res = [[(t1[0]+ t2[0]), (t1[1]+t2[1])/2] for t1 in test for t2 in test if t1 != t2]

Output:

>>> sorted(res, key=lambda x: x[1])[0]
[[3, 5, 1, 3], 3.0]

>>> sorted(res, key=lambda x: x[1])
[[[3, 5, 1, 3], 3.0],
 [[1, 3, 3, 5], 3.0],
 [[2, 8, 1, 3], 3.5],
 [[1, 3, 2, 8], 3.5],
 [[3, 5, 2, 8], 4.5],
 [[2, 8, 3, 5], 4.5],
 [[1, 3, 5, 9], 4.5],
 [[5, 9, 1, 3], 4.5],
 [[3, 5, 5, 9], 5.5],
 [[5, 9, 3, 5], 5.5],
 [[2, 8, 5, 9], 6.0],
 [[5, 9, 2, 8], 6.0]]

answered Jun 27 '22 at 15:32

I'mahdi

23,382
5
22
30

``` test = [] test.append([[3,5],4]) test.append([[2,8],5]) test.append([[1,3],2]) test.append([[5,9],7]) test res = [] distances = [] for t1 in test: for t2 in test: if t1 != t2: print("Comparing",t1,"with",t2) distance = abs(t1[1] - t2[1]) print("Distance",distance) distances.append([distance,t1,t2]) result = min(distances) result[0] = result[1][0] + result[2][0] result[1] = (result[1][1] + result[2][1])/2 del result[-1] print(result) ``` – Charles David Mupende Jun 27 '22 at 15:52

score 0 · Answer 2 · answered Jun 27 '22 at 15:36

You can calculate all distances (essentially all combinations 2 by 2) and sort them by smallest distance, them join the lists (the code below assumes test as you defined in your question):

from itertools import combinations

best_pair = sorted([(abs(el1[1] - el2[1]), el1, el2) for el1, el2 in combinations(test, 2)])[0]
result = [best_pair[1][0] + best_pair[2][0], abs(best_pair[1][1] + best_pair[2][1])/2]

There are optimisations that can be made if you have too many clusters, as the algorithm above is quadratic in time and memory.

Finding the two clusters with smallest distance between a list of clusters

2 Answers2