Label groups in a heat map

Question

I have an array where the ij th entry is the number of genes common to areas i and j that are differentially expressed in i with respect to j.

Labeling every xtick and ytick will make the graph too crowded. Similar to this question and this question I want to group labels on my x-axis.

The xticklabels of the heat map in the following image from Hawrylycz et al (2012) are a good example of what I want The xticklabels refer to more general regions. For example, all the columns under frontal lobe corrsepond to structures in the brain within the frontal lobe. Hawrylycz et al. (2012)

I am not trying to replicate the yticklabels, or bar graph inset.

My approach

For each box in the heat map I have an ontology. I am choosing to plot structures in a few regions, for example only the "frontal lobe and parietal lobe."

Using the ontology I can discover the start and end index of the group of columns for each structure. How do I use those indices to draw a group label?

http://matplotlib.org/api/ticker_api.html#tick-locating has details on nigh-total control. However, `ax.xticks', `ax.set_xticklabels` might be enough. — cphlewis, Mar 09 '15 at 04:26
[this](http://stanford.edu/~mwaskom/software/seaborn/examples/structured_heatmap.html) might be useful — cphlewis, Apr 10 '15 at 06:58
Yes! I also found seaborn while trying to answer my question. — mac389, Apr 10 '15 at 09:13
I was also thinking that, if there isn't a class for doing standard plots like above, it would be a good useful exercise to make one. — cphlewis, Apr 10 '15 at 09:23

cphlewis · Accepted Answer · 2015-03-23T17:01:32.120

Like so:

import pandas as pd
from numpy.random import random_integers
from numpy import reshape
import matplotlib.pyplot as plt
from matplotlib.ticker import FixedLocator, FixedFormatter
alph = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
lalph = list(alph.lower())
alph = list(alph)

df = pd.DataFrame(random_integers(0,100,(26,26)),columns=alph,
                  index=lalph)

# Two lines just to make a plaid image in imshow 
differ = reshape([sum(df[col2]-df[col]) for col2 in df for col in df], (26,26))
differ = pd.DataFrame(differ, columns=alph,index=alph)

# pick the labels you want
ticks = [2, 14, 18, 19, 22] # C=2 because A=0 because Python is 0-indexed
ticklabels = [alph[x] for x in ticks]

fig = plt.figure(figsize=(3,5))
ax = fig.add_subplot(111)
ax.imshow(differ)
ax.autoscale(False)

# display only the chosen ticks and ticklabels
ax.xaxis.set_major_locator(FixedLocator(ticks))
ax.xaxis.set_major_formatter(FixedFormatter(ticklabels))

enter image description here

You'll have a list of strings naming genes, not a string being used as a list of letters, but the imshow axis indexes are still the indexes of the underlying numpy array.

How would you draw a hairpin or bar to denote the span of 'C', 'O', etc? — mac389, Mar 23 '15 at 13:12
The bars would be a barplot on a second subplot that shares an X-axis with the main subplot; vertical lines outlining them to get all the way to the edge of the main. Ganged subplots might be the easiest way. Those white outlines in the main subplot, Rectangles with white edgecolor and no facecolor. Is this a standard kind of plot for a standard kind of data? — cphlewis, Mar 23 '15 at 17:05
This is a standard type of plot for gene expression data. R makes plots these. I prefer Python because I analyze multiple types of data and I find Python code easier to read. — mac389, Mar 23 '15 at 17:41

Label groups in a heat map

1 Answers1