6

I have the following dendrogram made with SciPy:

# create the dendrogram
from scipy.cluster import hierarchy as hc
from scipy.stats import spearmanr as sp
import matplotlib.pyplot as plt
%matplotlib inline

corr = np.round(sp(data_for_cluster).correlation, 4)
corr_condensed = hc.distance.squareform(1-corr)
z = hc.linkage(corr_condensed, method='average')
fig = plt.figure(figsize=(20,35))
dendrogram = hc.dendrogram(z, labels=vals_to_keep, orientation='left', 
leaf_font_size=14)
plt.show()

Which gives the following picture:

enter image description here

I'm not quite sure how to interpret the colors in the dendrogram. From the documentation it gives the following description of the color_threshold parameter:

For brevity, let t be the color_threshold. Colors all the descendent links below a cluster node k the same color if k is the first node below the cut threshold t. All links connecting nodes with distances greater than or equal to the threshold are colored blue. If t is less than or equal to zero, all nodes are colored blue. If color_threshold is None or ‘default’, corresponding with MATLAB(TM) behavior, the threshold is set to 0.7*max(Z[:,2]).

However, what confuses me is why some clusters which appear to be very close together are not given a different color, whereas other clusters which are further apart.......do.

I would think, all else being equal, the closer clusters are together the more likely they are to be a different color, to represent cluster membership but this doesn't seem to be the case.

Jonathan Bechtel
  • 3,497
  • 4
  • 43
  • 73

1 Answers1

0

I believe, As mentioned, the coloring in dendrogram is based on the threshold 0.7max(Z[:,2]). The first cluster in Green color and second cluster in black color and this coloring continues until the threshold is reached. So on all other coloring. That means up to 0.7max(Z[:,2]) threshold dendrogram shows same color.

Reshma
  • 1