1

I've computed a minimum spanning tree from a distance matrix, using NetworkX. I want now to build a dendrogram from it.

My MST : ![MST](http://www.bedside-ai.com/mst.png)

I've tried using the adjacency matrix (using NetworkX's to_pandas_adjacency)

(T is my MST)

df = nx.to_pandas_adjacency(T)

from scipy.spatial.distance import squareform
dist_array = squareform(df) #https://stackoverflow.com/questions/18952587/use-distance-matrix-in-scipy-cluster-hierarchy-linkage

plt.figure(figsize=(10,10)) 
mergings = linkage(dist_array, method='complete', metric='euclidean')
dendrogram(mergings, labels = distances.index, leaf_rotation=90, leaf_font_size=14)
plt.show()

Now, as the adjacency matrix is filled with 0's for non-edges, I guess linkage compute Euclidean distance and end up with a 3 clusters dendrogram (where all the cluster's points are at 0 distance), while I'm expecting to get the same linkage as in my original MST !

enter image description here

I tried using inf or large value for the nonedge default value to to_pandas_adjacency, but end up with invalid matrix...

Help anyone ? My best guess is that I'm not understanding and using linkage as I should...

Edit I know, doing it the other way around (DT and then build the MST) might probably be easier, but I need to reproduce the order of operations in order to recreate the results of an original study...

Edit 2 Since the scipy's linkage function compute Euclidean distance between each point (or node here), I guess (but without any certainty) I need to find a way to convert my adjacency matrix to an array similar to what's linkage function output, ie weighted edge list, but with sub clusters size as fourth column.

DrGorilla.eth
  • 220
  • 2
  • 11
  • Not sure if this is related to your problem, but your graph contains a cycle, so it is not a tree. – Nils Werner Aug 09 '19 at 09:47
  • I'm using built-in netwokx's MST function (T=nx.minimum_spanning_tree(G)), but I notice weird drawing troubles (nodes are, I think, just drawn on top of edges, but without being connected to it), but those aren't cycles in adjacency matrix though (42's only edge is to 40). – DrGorilla.eth Aug 09 '19 at 09:51
  • Yeah, that is expected as there is no edge avoidance in `networkx`s node layout routines. – Paul Brodersen Aug 09 '19 at 13:00
  • Can you link to the edge list of your MST? – Paul Brodersen Aug 09 '19 at 13:01
  • @Paul Brodersen Sure ! Here is the weighted edges list : https://www.bedside-ai.com/edges_list And the MST as a (binary) pickle : https://www.bedside-ai.com/mst_graph.pkl – DrGorilla.eth Aug 09 '19 at 13:13
  • A problem may be that your `dist_array` is just the adjacency matrix, not a distance matrix. I would assume that you would need to compute the path length between all combinations of nodes to populate a distance matrix (`nx.shortest_path_length`). – Paul Brodersen Aug 09 '19 at 13:24
  • I think I get what you mean, but I'm not quite sure : I do have my original distance matrix (from which my graph and MST are computed), so I don't need to compute it again, but, as clustering is actually my main end goal, how can I be sure that I don't end up with different clusters (following Kruskal's for the MST (I know, even there, I can have multiple MST for a single graph) and minimization of Euclidean distances for linkage/DT - there is a glimpse in my code, should be ```method="single"```) ? Thanks for the help;) – DrGorilla.eth Aug 09 '19 at 13:51

1 Answers1

2

I have a similar question and am trying to find a solution. It's the first time I've posted an answer, so please let me know if there is any issue.

I suggest you use linkage and dendrogram directly from scipy.cluster.hierarchy rather than using the networkx package.

You first convert the distance matrix to a condensed distance matrix by scipy.spatial.distance.squareform and then you use scipy.cluster.hierarchy.linkage to obtain the clustering.

There are different distance functions you can use in linkage

Finally you plot the clustering using dendrogram.

The results shall be consistent with the minimum spanning tree in networkx.

  • i used the floyd_warshall_numpy function in networkx to generate my distance matrix. i was actually interested in generating a dendrogram that match an ontology (DAG), so i took a small square from the total result of the FWN function. – rictuar Dec 15 '22 at 03:34