I've computed a minimum spanning tree from a distance matrix, using NetworkX. I want now to build a dendrogram from it.
I've tried using the adjacency matrix (using NetworkX's to_pandas_adjacency)
(T is my MST)
df = nx.to_pandas_adjacency(T)
from scipy.spatial.distance import squareform
dist_array = squareform(df) #https://stackoverflow.com/questions/18952587/use-distance-matrix-in-scipy-cluster-hierarchy-linkage
plt.figure(figsize=(10,10))
mergings = linkage(dist_array, method='complete', metric='euclidean')
dendrogram(mergings, labels = distances.index, leaf_rotation=90, leaf_font_size=14)
plt.show()
Now, as the adjacency matrix is filled with 0's for non-edges, I guess linkage compute Euclidean distance and end up with a 3 clusters dendrogram (where all the cluster's points are at 0 distance), while I'm expecting to get the same linkage as in my original MST !
I tried using inf or large value for the nonedge default value to to_pandas_adjacency, but end up with invalid matrix...
Help anyone ? My best guess is that I'm not understanding and using linkage as I should...
Edit I know, doing it the other way around (DT and then build the MST) might probably be easier, but I need to reproduce the order of operations in order to recreate the results of an original study...
Edit 2 Since the scipy's linkage function compute Euclidean distance between each point (or node here), I guess (but without any certainty) I need to find a way to convert my adjacency matrix to an array similar to what's linkage function output, ie weighted edge list, but with sub clusters size as fourth column.