I'm attempting to mimick the plot found in this question: plotting results of hierarchical clustering ontop of a matrix of data in python
And I've used the code found there for a couple of occasions with no problem. But in applying it to some new data, I'm finding that there's an inverted pattern in the linkage of the dendrogram.
As you'll see in the upper parts of the dark blue links, these clusters are starting above the branch point. Perhaps this is a typical behavior for certain cases, but it seemed counter-intuitive for a dendrogram to branch at a lower diversity than the daughter branches. What's being compared here are various sets of protein abundances across a set of features. One minus the correlation between the features is what's actually being fed into the scipy linkage function. Code for the plot as a whole is below:
#Making correlation matrix with dendrograms
corr = 1 - df_log2.corr()
corr_condensed = hc.distance.squareform(corr) # convert to condensed
# Compute and plot first dendrogram.
fig = pylab.figure(figsize=(30,30))
ax1 = fig.add_axes([-0.1,0.1,0.35,0.6])
Y = hc.linkage(corr_condensed, method='centroid')
Z1 = hc.dendrogram(Y, orientation='left')
ax1.set_xticks([])
ax1.set_yticks([])
# Compute and plot second dendrogram.
ax2 = fig.add_axes([0.3,0.75,0.6,0.35])
Y = hc.linkage(corr_condensed, method='centroid')
Z2 = hc.dendrogram(Y)
ax2.set_xticks([])
ax2.set_yticks([])
# Plot distance matrix.
axmatrix = fig.add_axes([0.3,0.1,0.6,0.6])
idx = list(Z1['leaves'])
featdict = {0: 'IndA 3K', 1: 'IndA 5.4K', 2: 'IndA 12.2K', 3: 'IndA 24K', 4: 'IndA 78.4K', 5: 'IndA 110K', 6: 'IndA 195.5K',
7: 'IndB 3K', 8: 'IndB 5.4K', 9: 'IndB 12.2K', 10: 'IndB 24K', 11: 'IndB 78.4K', 12: 'IndB 110K', 13 :'IndB 195.5K',
14: 'IndC 3K', 15: 'IndC 5.4K', 16: 'IndC 12.2K', 17: 'IndC 24K', 18: 'IndC 78.4K', 19: 'IndC 110K', 20: 'IndC 195.5K',
21: 'UnA 3K', 22: 'UnA 5.4K', 23: 'UnA 12.2K', 24: 'UnA 24K', 25: 'UnA 78.4K', 26: 'UnA 110K', 27: 'UnA 195.5K',
28: 'UnB 3K', 29: 'UnB 5.4K', 30: 'UnB 12.2K', 31: 'UnB 24K', 32: 'UnB 78.4K', 33: 'UnB 110K', 34: 'UnB 195.5K',
35: 'UnC 3K', 36: 'UnC 5.4K', 37: 'UnC 12.2K', 38: 'UnC 24K', 39: 'UnC 78.4K', 40: 'UnC 110K', 41: 'UnC 195.5K'}
corr.index = list(range(0,42))
corr.columns = list(range(0,42))
#corr = corr[idx1][:]
#corr = corr[:][idx2]
carray = corr.values
corrind = carray[idx,:][:,idx]
corrflip = 1 - corrind
im = axmatrix.matshow(corrflip, aspect='auto', origin='lower', cmap=pylab.cm.YlGnBu)
idx = [featdict[x] for x in idx]
axmatrix.set_xticklabels(['']+idx, rotation = 90)
axmatrix.set_yticklabels(['']+idx)
axmatrix.xaxis.set_major_locator(ticker.MultipleLocator(1))
axmatrix.yaxis.set_major_locator(ticker.MultipleLocator(1))
# Plot colorbar.
axcolor = fig.add_axes([0.91,0.1,0.02,0.6])
pylab.colorbar(im, cax=axcolor)
fig.suptitle('Pearson Correlation \nMatrix with Centroid \nLinkage Dendrogram', x=0.13, y=0.9, fontsize=40)
fig.savefig('PearsonCorr_matrixwithdendro_normlog2.png', dpi=500, bbox_inches='tight')