4

I want to pass my own distance matrix (row linkages) to seaborn clustermap.

There are already some posts on this like

Use Distance Matrix in scipy.cluster.hierarchy.linkage()?

But they all point to

scipy hierarchy linkage

Which takes the clustering metric and method as arguments.

scipy.cluster.hierarchy.linkage(y, method='single', metric='euclidean', optimal_ordering=False)

The input y may be either a 1d condensed distance matrix or a 2d array of observation vectors

What I dont get is this:

My distance matrix is already based on a certain metric and method, why would I want to recalculate this in scipy hierarchy linkage ?

Is there an option where it purely uses my distances and creates the linkages?

Community
  • 1
  • 1
Mario L
  • 507
  • 1
  • 6
  • 15
  • You say `linkage` "takes the clustering metric and method as arguments". Take another look at the docstring; linkage also accepts the precomputed distances, but they must be represented as a "condensed" distance matrix (which is just a 1-d array containing the nonredundant data from a distance matrix). If you pass the condensed distance matrix to linkage, the metric argument is ignored. Then look again at the first question you linked, which answers your question. – Warren Weckesser Aug 01 '19 at 14:27

1 Answers1

4

For posterity, here is a complete method of how to do this, as @WarrenWeckesser in the comments and @SibbsGambling in the linked answer leave out some details.

Suppose distMatrix is your matrix of distances (don't have to be Euclidean), with entry in row i and column j representing the distance between the ith and jth objects. Then:

# import packages
from scipy.cluster import hierarchy
import scipy.spatial.distance as ssd
import seaborn as sns

# define distance array as in linked answer
distArray = ssd.squareform(distMatrix) 

# define linkage object
distLinkage = hierarchy.linkage(distArray)

# make clustermap
sns.clustermap(distMatrix, row_linkage=distLinkage, col_linkage=distLinkage)

Note that when creating the clustermap, you still have to reference the original matrix. If you want to use a different clustering method, such as method='ward', include that option when defining distLinkage.

Jānis Lazovskis
  • 190
  • 2
  • 13