5

I would like to have some help with the graphic output of cluster maps with seaborn.

In my data, I have missing data that are transformed as 0.

I would like to have a white colour for the value that are equal to zero and a palette for the rest of the values.

Is there a way to indicate it in cmap?

import pandas as pd
from random import randint
import seaborn as sns
import matplotlib.pyplot as plt


df = pd.DataFrame({'A': [randint(1, 10) for x in xrange(10)]+[randint(30, 50) for x in xrange(5)]+[randint(70, 100) for x in xrange(5)],
         'B': [randint(0, 2) for x in xrange(10)]+[randint(30, 50) for x in xrange(5)]+[randint(70, 100) for x in xrange(5)],
         'C': [randint(0, 10) for x in xrange(10)]+[randint(30, 50) for x in xrange(5)]+[randint(60, 100) for x in xrange(5)],
         'D': [randint(0, 40) for x in xrange(10)]+[randint(30, 50) for x in xrange(5)]+[randint(60, 100) for x in xrange(5)]})

cmap = sns.cubehelix_palette(as_cmap=True, start=.5, rot=-.75, light=.9)

sns.clustermap(df, figsize=(13, 13), cmap=cmap)

Actual cluster: Actual cluster

Result with white for values=0: Result with white for values=0

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
manz
  • 105
  • 1
  • 5

2 Answers2

10

clustermap has the kwarg mask. From the docs:

mask : boolean array or DataFrame, optional

If passed, data will not be shown in cells where mask is True. Cells with missing values are automatically masked. Only used for visualizing, not for calculating.

So, for your example, you can use a boolean array, like so: mask=(df==0)

sns.clustermap(df, figsize=(13, 13), cmap=cmap, mask=(df==0))

enter image description here

Community
  • 1
  • 1
tmdavison
  • 64,360
  • 12
  • 187
  • 165
  • 1
    Yep, that's what I would suggest. Note that masked cells show the background of the axes, so if you you'll want to se the axes style to one of the ones with a white background rather than the default to get exactly what you're looking for. – mwaskom Jan 14 '16 at 16:04
0

This answer didn't work, but setting those values (when equal to zero) to NA worked.

import numpy as np

df.replace(0, np.nan, inplace=True)

# or
df = df.replace(0, np.nan)
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Preethi
  • 199
  • 1
  • 2
  • 7
  • In `seaborn 0.12.1` this results in `ValueError: The condensed distance matrix must contain only finite values.` I'm voting to delete this answer because it's not valid for `clustermap` – Trenton McKinney Nov 23 '22 at 21:52