5

I'm working with python 2.7.9.
I use scipy.cluster.hierarchy.dendrogram to show my clustering result. Dendrogram here. One problem is that, I have about 200 data. I cannot see clearly their labels.

...
z=linkage(dist, method='complete')
R=dendrogram(z, labels=mylabels)

enter image description here

1.I know that R["ival"] has the labels corresponding to the leaf nodes, but it's not an easy job to match a value and data in such a dense figure.

2.I think of extracting a part of the data. For example, green links in the left. At this scale the labels can be seen clearly. And I think that's a way with great flexibility to analyze data. But I do not know how to do that.

3.I use leaf_label_func. My goal is: when a data really belongs to a class--cups, for example-- show part of its name/label. For example, if one model has a name "cups_b1", then just show "b1". So, at least I can see the locations of one category of my data one time.

def llf(id):
  if id< nmodels:
    mylabel=labels[id]
    if mylabel.find("cups")!=-1:
      index=mylabel.find("_")
      outlabel=mylabel[index+1:]
      return outlabel
    else:
      return ""   #without the else part the function will return None, and that makes the output figure strange
R=dendrogram(z, leaf_label_func=llf, leaf_rotation=90 )

But even this, I cannot recognize the labels.

enter image description here

dudu
  • 801
  • 1
  • 10
  • 32

1 Answers1

0

There isn't really a great method to visually extract small details out of dendrograms. A couple solutions come to mind.

Work with the cluster data outside of the graph.

from collections import defaultdict

clusterdict = defaultdict(list)
for ind,clust in zip(R['leaves'],R['color_list']):
    clusterdict[clust].append(ind)

Now you can explore each cluster individually.

In [50]:
clusterdict['g']

Out[50]:
[73, 8, 30, 14, 0, 67, 91, 60, 81, 61, 83, 22]

Another option would be to print the dendrograms on both axes (x,y) as shown by the code here. Then if you absolutely must see the labels on the graph you can print half the labels on the x axis and half on the y axis.

Community
  • 1
  • 1
Kevin
  • 7,960
  • 5
  • 36
  • 57