1

I'm using the example dendrogram from this post in my work but would also like to keep track of which row / column is from which piece of data.

I've edited the code with records of names of the data as names as follows and would like to print out the names at the bottom and to the right of the distance matrix visualization. I've tried adding labels = names in the call to dendrogram but this didn't help.

Does anyone know how to add labels to this?

import scipy
import pylab
import scipy.cluster.hierarchy as sch 

# Generate random features and distance matrix.
x = scipy.rand(40)
D = scipy.zeros([40,40])
for i in range(40):
    for j in range(40):
        D[i,j] = abs(x[i] - x[j])

### new code
names = [ ] 
for i in range(40):
    names.append( 'str%i'%( i ) ) 
    print names[-1]
### end new code

# Compute and plot first dendrogram.
fig = pylab.figure(figsize=(8,8))
ax1 = fig.add_axes([0.09,0.1,0.2,0.6])
Y = sch.linkage(D, method='centroid')
Z1 = sch.dendrogram(Y, orientation='right')
ax1.set_xticks([])
ax1.set_yticks([])

# Compute and plot second dendrogram.
ax2 = fig.add_axes([0.3,0.71,0.6,0.2])
Y = sch.linkage(D, method='single')
Z2 = sch.dendrogram(Y)
ax2.set_xticks([])
ax2.set_yticks([])

# Plot distance matrix.
axmatrix = fig.add_axes([0.3,0.1,0.6,0.6])
idx1 = Z1['leaves']
idx2 = Z2['leaves']
D = D[idx1,:]
D = D[:,idx2]
im = axmatrix.matshow(D, aspect='auto', origin='lower', cmap=pylab.cm.YlGnBu)
axmatrix.set_xticks([])
axmatrix.set_yticks([])

# Plot colorbar.
#axcolor = fig.add_axes([0.91,0.1,0.02,0.6])
#pylab.colorbar(im, cax=axcolor)
fig.show()
fig.savefig('dendrogram.png')
Community
  • 1
  • 1
drjrm3
  • 4,474
  • 10
  • 53
  • 91

1 Answers1

1

The python package heatmapcluster (available on PyPI) that I wrote accepts (in fact, requires) labels.

Here's a simplified version of your script using heatmapcluster:

import numpy as np
import matplotlib.pyplot as plt
from heatmapcluster import heatmapcluster


# Generate random features and distance matrix.
x = np.random.rand(40)
D = np.abs(np.subtract.outer(x, x))

names = ['str%i' % i for i in range(len(x))]

h = heatmapcluster(D, names, names,
                   num_row_clusters=3, num_col_clusters=3,
                   label_fontsize=8,
                   xlabel_rotation=-75,
                   cmap=plt.cm.coolwarm,
                   show_colorbar=True,
                   top_dendrogram=True)

plt.show()

And here is the plot it generates: plot

(Note that, for a symmetric array like D, there is really no point in clustering both axes. By symmetry, they will generate the same dendrogram.)

Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214