0

Clustermap generates a ClusterGrid object. The ordering of "sibling" clusters is arbitrary and I cannot find anyway to control this. What I am looking for, is to have them sorted based on the data, e.g. the cluster with the highest average values would come first. Preferably, both rows and columns should be sorted in this way.

This question is somewhat related to this question, but they just want to be able to reorder with a custom order based on the label names: Reordering the high-level clusters from seaborn clustermap results

dannedanne
  • 103
  • 7
  • By looking at a few different dendrograms, it seems that the default behavior is not random, but to sort smaller clusters to the left/top. – dannedanne Aug 12 '20 at 12:52
  • So, this is probably what I would need: https://github.com/mwaskom/seaborn/issues/1844 So, if anyone could help me with a workaround of using scipy.cluster.hierarchy.linkage and feed that into clustermap, I would be very greatful! – dannedanne Aug 13 '20 at 11:24
  • Ok, so I wrote up some code to do what I want. Basically, for each tree (rows and columns) I calculated a linkage matrix. The format of the linkage matrix is explained in this answer. https://stackoverflow.com/a/40983611/2724383 Each row in the matrix defines an internal node of the tree. The first two columns defines the two children of the node and the third column the depth of the node. So I calculated the average value for each node and sorted the two first fields depending on the average values of the children. Perhaps I will write up an example later and answer my own question. – dannedanne Aug 14 '20 at 08:14
  • I also tried to use the "optimal_ordering" flag, but it did not do what I want. That algorithm maximizes the correlation between neighboring rows/columns. I feel it is more intuitive to sort the children nodes to consequently having the child with the highest average value always to one side. – dannedanne Aug 14 '20 at 08:22

0 Answers0