I have the following dendrogram made with SciPy:
# create the dendrogram
from scipy.cluster import hierarchy as hc
from scipy.stats import spearmanr as sp
import matplotlib.pyplot as plt
%matplotlib inline
corr = np.round(sp(data_for_cluster).correlation, 4)
corr_condensed = hc.distance.squareform(1-corr)
z = hc.linkage(corr_condensed, method='average')
fig = plt.figure(figsize=(20,35))
dendrogram = hc.dendrogram(z, labels=vals_to_keep, orientation='left',
leaf_font_size=14)
plt.show()
Which gives the following picture:
I'm not quite sure how to interpret the colors in the dendrogram. From the documentation it gives the following description of the color_threshold
parameter:
For brevity, let t be the color_threshold. Colors all the descendent links below a cluster node k the same color if k is the first node below the cut threshold t. All links connecting nodes with distances greater than or equal to the threshold are colored blue. If t is less than or equal to zero, all nodes are colored blue. If color_threshold is None or ‘default’, corresponding with MATLAB(TM) behavior, the threshold is set to 0.7*max(Z[:,2]).
However, what confuses me is why some clusters which appear to be very close together are not given a different color, whereas other clusters which are further apart.......do.
I would think, all else being equal, the closer clusters are together the more likely they are to be a different color, to represent cluster membership but this doesn't seem to be the case.