I have a very large sparse matrix (few million rows, 500 columns).
I have already cumputed a distance matrix of 5000X5000.
I need to use scipy.cluster.hierarchy.linkage
to get the clustering according to this matrix.
I know that linkage
accepts a custom function, but computing this distance matrix again is very time consuming.
How can I tell scipy to use the distances by the matrix?
I tried
dist = my_dist(X) # numpy array ndim = 2
linkage(X, metric=lambda x: dist[x,y])
but the x,y
passed are the values and not the indexes.