3

I'm trying to create dendrograms from two different distance matrices and compare them. I used the code here as a starting point, but the problem is since I'm using two different matrices but same clustering method, I need to plot two different matrices together for a comparative analysis. I was wondering if it is possible to separate to halves of each square/node diagonally to show two different distance matrices.

This image represents the result which I'm targeting for: enter image description here

Here is my code:

from sklearn import preprocessing
from sklearn.neighbors import DistanceMetric 
import pandas as pd
import numpy as np
from ete3 import Tree
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics.pairwise import cosine_distances
import scipy
import pylab
import scipy.cluster.hierarchy as sch
import scipy.spatial.distance as sd 
import random
#g[n] is a one dimensional array containing datapoints
g1 = random.sample(range(30), 5)
g2 = random.sample(range(30), 5)
g3 = random.sample(range(30), 5)
g4 = random.sample(range(30), 5)
g5 = random.sample(range(30), 5)
g1 = np.array(g1)
g2 = np.array(g2)
g3 = np.array(g3)
g4 = np.array(g4)
g5 = np.array(g5)
X = (g1,g2,g3,g4,g5)
#Comparing between euclidean and cosine###########################################
distanceC = cosine_distances(X)
dist = DistanceMetric.get_metric('euclidean')
distanceE = dist.pairwise(X)
##################################################################################

#Plots############################################################################

# Compute and plot first dendrogram.
fig = pylab.figure(figsize=(8,8))
ax1 = fig.add_axes([0.09,0.1,0.2,0.6])
Y = sch.average(sd.squareform(distanceC))
Z1 = sch.dendrogram(Y, orientation='right')
ax1.set_xticks([])
ax1.set_yticks([])

# Compute and plot second dendrogram.
ax2 = fig.add_axes([0.3,0.71,0.6,0.2])
Y = sch.average(sd.squareform(distanceE))
Z2 = sch.dendrogram(Y)
ax2.set_xticks([])
ax2.set_yticks([])

# Plot distance matrix.
axmatrix = fig.add_axes([0.3,0.1,0.6,0.6])
idx1 = Z1['leaves']
idx2 = Z2['leaves']
distance = distance[idx1,:]
distance = distance[:,idx2]
im = axmatrix.matshow(distance, aspect='auto', origin='lower', cmap=pylab.cm.YlGnBu)
axmatrix.set_xticks([])
axmatrix.set_yticks([])

# Plot colorbar.
axcolor = fig.add_axes([0.91,0.1,0.02,0.6])
pylab.colorbar(im, cax=axcolor)
fig.show()
fig.savefig('dendrogram.png')
##################################################################################
Siddharth
  • 373
  • 2
  • 17
  • I have removed the second question. While I understand that the code example here is kind of "broken", the problem is that the code generating the lists g1, g2...g5 has lot's of file IO and processing operations, which are not really relevant, still I have tied to substitute it with a random list generator, which should do the work. – Siddharth May 31 '17 at 17:35

1 Answers1

6

There is no built-in method to draw an image consisting of triangles, cutting the pixels in half.

So one would need to build some custom heatmap. This could be done using a PolyCollection of triangles. In the solution below a function creates the points of a triangle around the origin, rotates them if needed, and applies an offset. Looping over the array allows to create a triangle for each point. Finally all those triangles are collected into a PolyCollection.

You may then decide to use a normal imshow or matshow plot for one of the arrays and the custom triangle matrix on top of it.

import matplotlib.pyplot as plt
import matplotlib.collections as collections
import numpy as np

def triatpos(pos=(0,0), rot=0):
    r = np.array([[-1,-1],[1,-1],[1,1],[-1,-1]])*.5
    rm = [[np.cos(np.deg2rad(rot)), -np.sin(np.deg2rad(rot))],
           [np.sin(np.deg2rad(rot)),np.cos(np.deg2rad(rot)) ] ]
    r = np.dot(rm, r.T).T
    r[:,0] += pos[0]
    r[:,1] += pos[1]
    return r

def triamatrix(a, ax, rot=0, cmap=plt.cm.viridis, **kwargs):
    segs = []
    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            segs.append(triatpos((j,i), rot=rot) )
    col = collections.PolyCollection(segs, cmap=cmap, **kwargs)
    col.set_array(a.flatten())
    ax.add_collection(col)
    return col


A,B = np.meshgrid(range(5), range(4))
B*=4

fig, ax=plt.subplots()
im1 = ax.imshow(A)
im2 = triamatrix(B, ax, rot=90, cmap="Reds")

fig.colorbar(im1, ax=ax, )
fig.colorbar(im2, ax=ax, )

plt.show()

Triangle heatmap

Of course it would be equally possible to use two of those triangle matrices

im1 = triamatrix(A, ax, rot=0, cmap="Blues")
im2 = triamatrix(B, ax, rot=180, cmap="Reds")
ax.set_xlim(-.5,A.shape[1]-.5)
ax.set_ylim(-.5,A.shape[0]-.5)

which would also require to set the axis limits manually.

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • Thanks! This is exactly what I'm looking for. I'm having slight trouble in integrating the method with dendrograms now, i.e. the leafs are not aligned with the corresponding distance in the matrix. – Siddharth May 31 '17 at 20:00
  • Sorry,I don't have scikit-lern available. Can you set the ticks visible for all 3 plots and provide an image from which one can see what's going wrong? – ImportanceOfBeingErnest May 31 '17 at 20:06
  • I''m sorry for the late reply, and I tried to play a little bit more with the code. I guess the problem is that there are two different kind of placement methods involved, (add_axes for dendrograms and add_subplot for distance matrix). The resulting plot I'm getting is strange: http://imgur.com/a/AwJfi – Siddharth May 31 '17 at 21:52
  • I see the problem. But the first solution using a matshow or imshow plot for one of the matrices should work, right? I updated the answer to include the setting of the limits for the second solution. – ImportanceOfBeingErnest May 31 '17 at 22:05
  • Yes! The first solution works, and now the second solution works too. I can probably fine tune the code to my specification, but otherwise it is more than perfect, Thank you! – Siddharth May 31 '17 at 22:55