I have two arrays which contains instances from DATA called A and B. These two arrays then refer to another array called Distance.
I need the fast way to:
- find the points combination between A and B,
- find the results of the distance from the combination in Distance
For example:
DATA = [0,1,...100]
A = [0,1,2]
B = [6,7,8]
Distance = [100x100] # contains the pairwise distance of all instances from DATA
# need a function to combine A and B
points_combination=[[0,6],[0,7],[0,8],[1,6],[1,7],[1,8],[2,6],[2,7],[2,8]]
# need a function to refer points_combination with Distance, so that I can get this results
distance_points=[0.346, 0.270, 0.314, 0.339, 0.241, 0.283, 0.304, 0.294, 0.254]
I already try to solve it myself, but when it deals with large data it's very slow
Here's the code I tried:
import numpy as np
def function(pair_distances, k, clusters):
list_distance = []
cluster_qty = k
for cluster_id in range(cluster_qty):
all_clusters = clusters[:] # List of all instances ID on their own cluster
in_cluster = all_clusters.pop(cluster_id) # List of instances ID inside the cluster
not_in_cluster = all_clusters # List of instances ID outside the cluster
# combine A and B array into a points to refer to Distance array
list_dist_id = np.array(np.meshgrid(in_cluster, np.concatenate(not_in_cluster))).T.reshape(-1, 2)
temp_dist = 9999999
for instance in range(len(list_dist_id)):
# basically refer the distance value from the pair_distances array
temp_dist = min(temp_dist, (pair_distances[list_dist_id[instance][0], list_dist_id[instance][1]]))
list_distance.append(temp_dist)
return list_distance
Notice that the nested loop is the source of the time consuming problem. This is my first time asking in this forum, so please let me know if you need more information.