6

I created a heatmap based on spearman's correlation matrix using seaborn clustermap as folowing: I want to paint the dendrogram. I want the dendrogram to look like this: dendrogram but on the heatmap

I created a dict of colors as folowing and got an error:

def assign_tree_colour(name,val_dict,coding_names_df):
ret = None
if val_dict.get(name, '') == 'Group 1':
    ret = "(0,0.9,0.4)"   #green
elif val_dict.get(name, '') == 'Group 2':
    ret = "(0.6,0.1,0)"   #red
elif val_dict.get(name, '') == 'Group 3':
    ret = "(0.3,0.8,1)"   #light blue
elif val_dict.get(name, '') == 'Group 4':
    ret = "(0.4,0.1,1)"   #purple
elif val_dict.get(name, '') == 'Group 5':
    ret = "(1,0.9,0.1)"   #yellow
elif val_dict.get(name, '') == 'Group 6':
    ret = "(0,0,0)"   #black
else:
    ret = "(0,0,0)"         #black
return ret

def fix_string(str):
    return str.replace('"', '')

external_data3 = [list(z) for z in coding_names_df.values]
external_data3 = {fix_string(z[0]): z[3] for z in external_data3}

tree_label = list(df.index)
tree_label = [fix_string(x) for x in tree_label]
tree_labels = { j : tree_label[j] for j in range(0, len(tree_label) ) }

tree_colour = [assign_tree_colour(label, external_data3, coding_names_df) for label in tree_labels]
tree_colors = { i : tree_colour[i] for i in range(0, len(tree_colour) ) }


sns.set(color_codes=True)
sns.set(font_scale=1)
g = sns.clustermap(df, cmap="bwr",
                   vmin=-1, vmax=1,
                   yticklabels=1, xticklabels=1,
                   cbar_kws={"ticks":[-1,-0.5,0,0.5,1]},
                   figsize=(13,13),
                   row_colors=row_colors,
                   col_colors=col_colors,
                   method='average',
                   metric='correlation',
                   tree_kws=dict(colors=tree_colors))
g.ax_heatmap.set_xlabel('Genus')
g.ax_heatmap.set_ylabel('Genus')
for label in Group.unique():
    g.ax_col_dendrogram.bar(0, 0, color=lut[label],
                            label=label, linewidth=0)
g.ax_col_dendrogram.legend(loc=9, ncol=7, bbox_to_anchor=(0.26, 0., 0.5, 1.5))
ax=g.ax_heatmap



 File "<ipython-input-64-4bc6be89afe3>", line 11, in <module>
tree_kws=dict(colors=tree_colors))



File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 1391, in clustermap
    tree_kws=tree_kws, **kwargs)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 1208, in plot
    tree_kws=tree_kws)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 1054, in plot_dendrograms
    tree_kws=tree_kws

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 776, in dendrogram
    return plotter.plot(ax=ax, tree_kws=tree_kws)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 692, in plot
    **tree_kws)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\collections.py", line 1316, in __init__
    colors = mcolors.to_rgba_array(colors)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\colors.py", line 294, in to_rgba_array
    result[i] = to_rgba(cc, alpha)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\colors.py", line 177, in to_rgba
    rgba = _to_rgba_no_colorcycle(c, alpha)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\colors.py", line 240, in _to_rgba_no_colorcycle
    raise ValueError("Invalid RGBA argument: {!r}".format(orig_c))

ValueError: Invalid RGBA argument: 0

Any help on this would be greatly appreciated! Tnx!

Rotem Bartuv
  • 133
  • 7

2 Answers2

5

According to sns.clustermap documentation, the dendrogram coloring can be set through tree_kws (takes a dict) and its colors attribute which expects a list of RGB tuples such as (0.5, 0.5, 1). It seems also that colors supports nothing except RGB tuple format data.

Did you notice that clustermap supports nested lists or data frames for hierarchical colorbars in between dendrograms and the correlation matrix? They could be useful if the dendrograms get too crowded.

I hope this helps!

Edit

The list of RGB is the sequence of line colors in LineCollection — it uses the sequence as it draws each line in both dendrograms. (The order seems that the order starts from the rightmost branch of the column dendrogram) In order to associate a certain label with a data point, you need to figure out the drawing order of data points in dendrograms.

Edit II

Here's a minimal example for coloring the tree based on sns.clustermap examples:

import matplotlib.pyplot as plt
import seaborn as sns; sns.set(color_codes=True)
import pandas as pd


iris = sns.load_dataset("iris")
species = iris.pop("species")
g = sns.clustermap(iris)
lut = dict(zip(species.unique(), "rbg"))
row_colors = species.map(lut)
# For demonstrating the hierarchical sidebar coloring
df_colors = pd.DataFrame(data={'r': row_colors[row_colors == 'r'], 'g': row_colors[row_colors == 'g'], 'b': row_colors[row_colors == 'b']}) 
# Simple class RGBA colormap
colmap = {'setosa': (1, 0, 0, 0.7), 'virginica': (0, 1, 0, 0.7), 'versicolor': (0, 0, 1, 0.7)}
g = sns.clustermap(iris, row_colors=df_colors, tree_kws={'colors':[colmap[s] for s in species]})
plt.savefig('clustermap.png')

clustermap.png As you can see, the order of the drawn lines of the tree start from the upper right corner of the image thus not being tied to the order of the data points visualized in clustermap. On the other hand, the color bars (controlled by {row,col}_colors attributes) could be used for that purpose.

kampmani
  • 680
  • 5
  • 13
  • I created a dict of colors and got matplotlib error: ValueError: Invalid RGBA argument: '(241, 196, 15, 1)' – Rotem Bartuv May 31 '20 at 09:39
  • The implementation of `LineCollection` expects RGB and RGBA arguments to have relative member values: Divide each RGB tuple value with 255 to scale the values to the range from 0 to 1. – kampmani Jun 01 '20 at 13:07
  • from this- (241,196,15,1) I changed it to this code- (0.95, 0.77, 0.06, 1.0) (divde with 255) and got this error: ValueError: Invalid RGBA argument: 0. – Rotem Bartuv Jun 02 '20 at 09:31
  • 1
    I edited my answer with an example. As the latest error states, `tree_colors` contains `0` element. – kampmani Jun 02 '20 at 12:32
3

Building on the answer above, here is the example coloring the main three branches differently, brute force (the first 49 lines in red, the next 35 lines in green and the last 62 lines in blue, remaining two lines in black):

import matplotlib.pyplot as plt
import seaborn as sns; sns.set(color_codes=True)
import pandas as pd


iris = sns.load_dataset("iris")
species = iris.pop("species")
g = sns.clustermap(iris)
lut = dict(zip(species.unique(), "rbg"))
row_colors = species.map(lut)
# For demonstrating the hierarchical sidebar coloring
df_colors = pd.DataFrame(data={'r': row_colors[row_colors == 'r'], 'g': row_colors[row_colors == 'g'], 'b': row_colors[row_colors == 'b']}) 
# Simple class RGBA colormap
colmap = {'setosa': (1, 0, 0, 0.7), 'virginica': (0, 1, 0, 0.7), 'versicolor': (0, 0, 1, 0.7)}
g = sns.clustermap(iris, row_colors=df_colors, tree_kws={'colors':[(1,0,0,1)]*49+[(0,1,0,1)]*35+[(0,0,1,1)]*63+[(0,0,0,1)]*2})
plt.savefig('clustermap.png')

Brute Force coloring

For the general case, the number of lines to color can be derived from the dendrogram (described here scipy linkage format):

# The number of leaves is always the number of merges + 1 
# (if we have 2 leaves we do 1 merge)

n_leaves = len(g.dendrogram_row.linkage)+1

# The last merge on the array is naturally the one that joins
# the last two broad clusters together

n0_ndx = len(g.dendrogram_row.linkage) - 1

# At index [n0_ndx] of the linkage array, positions [0] and [1],
# we have the "indexes" of the two clusters that were merged.
# However, in order to find the actual index of these two
# clusters in the linkage array, we must subtract from this 
# position (cluster/element number) the total number of leaves, 
# because the cluster number listed here starts at 0 with the
# individual elements given to the function; and these elements
# are not themselves part of the linkage array.
# So linkage[0] has cluster number equal to n_leaves; and conversely,
# to calculate the index of a cluster in the linkage array,
# we must subtract the value of n_leaves from the cluster number.

n1_ndx = int(g.dendrogram_row.linkage[n0_ndx][0])-n_leaves
n2_ndx = int(g.dendrogram_row.linkage[n0_ndx][1])-n_leaves

# Similarly we can find the array index of clusters further down

n21_ndx = int(g.dendrogram_row.linkage[n2_ndx][0])-n_leaves
n22_ndx = int(g.dendrogram_row.linkage[n2_ndx][1])-n_leaves

# And finally, having identified the array index of the clusters
# that we are interested in coloring, we can determine the number
# of members in each cluster, which is stored in position [3]
# of each element of the array

n1 = int(g.dendrogram_row.linkage[n1_ndx][3])-1
n21 = int(g.dendrogram_row.linkage[n21_ndx][3])-1
n22 = int(g.dendrogram_row.linkage[n22_ndx][3])-1

# So we can finally color, with RGBa tuples, an amount of elements
# equal to the number of elements in each cluster of interest.  
  
g = sns.clustermap(iris, row_colors=df_colors, tree_kws={'colors':[(1,0,0,1)]*n1+[(0,1,0,1)]*n21+[(0,0,1,1)]*n22+[(0,0,0,1)]*(n_leave\
s-1-n1-n21-n22)})

Though, I have not figured out a way to color the top dendrogram differently...

Roberto
  • 2,696
  • 18
  • 31
dannedanne
  • 103
  • 7