1

I'm trying to add a bar-plot (stacked or otherwise) for each row in a seaborn clustermap.

Let's say that I have a dataframe like this:

import pandas as pd
import numpy as np
import random

df = pd.DataFrame(np.random.randint(0,100,size=(100, 8)), columns=["heatMap_1","heatMap_2","heatMap_3","heatMap_4","heatMap_5", "barPlot_1","barPlot_1","barPlot_1"])

df['index'] = [ random.randint(1,10000000)  for k in df.index]
df.set_index('index', inplace=True)
df.head()
       heatMap_1    heatMap_2   heatMap_3   heatMap_4   heatMap_5   barPlot_1   barPlot_1   barPlot_1
index                               
4552288 9   3   54  37  23  42  94  31
6915023 7   47  59  92  70  96  39  59
2988122 91  29  59  79  68  64  55  5
5060540 68  80  25  95  80  58  72  57
2901025 86  63  36  8   33  17  79  86

I can use the first 5 columns (in this example starting with prefix heatmap_) to create seaborn clustermap using this(or the seaborn equivalent):

sns.clustermap(df.iloc[:,0:5], )

and the stacked barplot for last four columns(in this example starting with prefix barPlot_) using this: df.iloc[:,5:8].plot(kind='bar', stacked=True)

but I'm a bit confused on how to merge both plot types. I understand that clustermap creates it's own figures and I'm not sure if I can extract just the heatmap from clustermap and then use it with subfigures. (Discussed here: Adding seaborn clustermap to figure with other plots). This creates a weird output. Edit: Using this:

import pandas as pd
import numpy as np
import random
import seaborn as sns; sns.set(color_codes=True)
import matplotlib.pyplot as plt
import matplotlib.gridspec


df = pd.DataFrame(np.random.randint(0,100,size=(100, 8)), columns=["heatMap_1","heatMap_2","heatMap_3","heatMap_4","heatMap_5", "barPlot_1","barPlot_2","barPlot_3"])
df['index'] = [ random.randint(1,10000000)  for k in df.index]
df.set_index('index', inplace=True)
g = sns.clustermap(df.iloc[:,0:5], )
g.gs.update(left=0.05, right=0.45)
gs2 = matplotlib.gridspec.GridSpec(1,1, left=0.6)
ax2 = g.fig.add_subplot(gs2[0])
df.iloc[:,5:8].plot(kind='barh', stacked=True, ax=ax2)

creates this: enter image description here

which does not really match well (i.e. due to dendrograms there is a shift).

Another options is to manually perform clustering and create a matplotlib heatmap and then add associated subfigures like barplots(discussed here:How to get flat clustering corresponding to color clusters in the dendrogram created by scipy)

Is there a way I can use clustermap as a subplot along with other plots ?

This is the result I'm looking for[1]: enter image description here

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Siddharth
  • 373
  • 2
  • 17
  • The prerequisite of applying [this solution](https://stackoverflow.com/questions/51811972/adding-seaborn-clustermap-to-figure-with-other-plots) is, as mentionned over there, "... but as long as all other content you want to have in the final figure can be created inside axes". This seems to be the case here. So it should work. What problem did you encounter? – ImportanceOfBeingErnest Feb 20 '19 at 14:28
  • @ImportanceOfBeingErnest, I updated my question. Using the aforementioned method creates a merged plot, but it is not "sharing the y axis" to put it crudely. An ideal situation would be to have an extensible method which can add subfigures based on output from seaborn clustermap without fine tweaking. – Siddharth Feb 20 '19 at 14:40
  • The upper clustergraph doesn't appear in your desired output. If you remove it, the two plots should be fitting again? – ImportanceOfBeingErnest Feb 20 '19 at 15:00
  • @ImportanceOfBeingErnest, setting `col_cluster=False` and `g.cax.set_visible(False)` still results in an output similar to above. Image: https://imgur.com/yvhTh9X – Siddharth Feb 20 '19 at 15:06
  • I see. Obviously the ideal solution would be for seaborn to take a gridspec as input. Everything else is more or less suboptimal. I might have a closer look later on. – ImportanceOfBeingErnest Feb 20 '19 at 15:14

1 Answers1

0

While not a proper answer, I decided to break it down and do everything manually. Taking inspiration from answer here, I decided to cluster and reorder the heatmap separately:

def heatMapCluter(df):
    row_method = "ward"
    column_method = "ward"
    row_metric = "euclidean"
    column_metric = "euclidean"

    if column_method == "ward":
        d2 = dist.pdist(df.transpose())
        D2 = dist.squareform(d2)
        Y2 = sch.linkage(D2, method=column_method, metric=column_metric)
        Z2 = sch.dendrogram(Y2, no_plot=True)
        ind2 = sch.fcluster(Y2, 0.7 * max(Y2[:, 2]), "distance")
        idx2 = Z2["leaves"]
        df = df.iloc[:, idx2]
        ind2 = ind2[idx2]
    else:
        idx2 = range(df.shape[1])

    if row_method:
        d1 = dist.pdist(df)
        D1 = dist.squareform(d1)
        Y1 = sch.linkage(D1, method=row_method, metric=row_metric)
        Z1 = sch.dendrogram(Y1, orientation="right", no_plot=True)
        ind1 = sch.fcluster(Y1, 0.7 * max(Y1[:, 2]), "distance")
        idx1 = Z1["leaves"]
        df = df.iloc[idx1, :]
        ind1 = ind1[idx1]
    else:
        idx1 = range(df.shape[0])
    return df

Rearranged the original dataframe:

clusteredHeatmap = heatMapCluter(df.iloc[:, 0:5].copy())
# Extract the "barplot" rows and merge them
clusteredDataframe = df.reindex(list(clusteredHeatmap.index.values))
clusteredDataframe = clusteredDataframe.reindex(
    list(clusteredHeatmap.columns.values)
    + list(df.iloc[:, 5:8].columns.values),
    axis=1,
)

and then used the gridspec to plot both "subfigures" (clustermap and barplot):

# Now let's plot this - first the heatmap and then the barplot.
# Since it is a "two" part plot which shares the same axis, it is
# better to use gridspec
fig = plt.figure(figsize=(12, 12))
gs = GridSpec(3, 3)
gs.update(wspace=0.015, hspace=0.05)
ax_main = plt.subplot(gs[0:3, :2])
ax_yDist = plt.subplot(gs[0:3, 2], sharey=ax_main)
im = ax_main.imshow(
    clusteredDataframe.iloc[:, 0:5],
    cmap="Greens",
    interpolation="nearest",
    aspect="auto",
)
clusteredDataframe.iloc[:, 5:8].plot(
    kind="barh", stacked=True, ax=ax_yDist, sharey=True
)

ax_yDist.spines["right"].set_color("none")
ax_yDist.spines["top"].set_color("none")
ax_yDist.spines["left"].set_visible(False)
ax_yDist.xaxis.set_ticks_position("bottom")


ax_yDist.set_xlim([0, 100])
ax_yDist.set_yticks([])
ax_yDist.xaxis.grid(False)
ax_yDist.yaxis.grid(False)

Jupyter notebook: https://gist.github.com/siddharthst/2a8b7028d18935860062ac7379b9279f

Image: enter image description here

1 - http://code.activestate.com/recipes/578175-hierarchical-clustering-heatmap-python/

Siddharth
  • 373
  • 2
  • 17