4

I have the following heatmap with the dendrogram like this.

The complete data are here.

The problem is that the dendrogram on the left is squished. How can I unsquish (expand) it without altering the column size of the heatmap?

enter image description here

It is generated with this code:

#!/usr/bin/Rscript
library(gplots);
library(RColorBrewer);


plot_hclust  <- function(inputfile,clust.height,type.order=c(),row.margins=70) {

    # Read data
    dat.bcd <- read.table(inputfile,na.strings=NA, sep="\t",header=TRUE);


    rownames(dat.bcd) <- do.call(paste,c(dat.bcd[c("Probes","Gene.symbol")],sep=" "))
    dat.bcd <- dat.bcd[,!names(dat.bcd) %in% c("Probes","Gene.symbol")] 
    dat.bcd <- dat.bcd

    # Clustering and distance function
    hclustfunc <- function(x) hclust(x, method="complete")
    distfunc <- function(x) dist(x,method="maximum")


    # Select based on FC, as long as any of them >= anylim

    anylim <- 2.0
    dat.bcd <- dat.bcd[ apply(dat.bcd, 1,function(x) any (x >= anylim)), ]


    # Clustering functions
    height <- clust.height; 

    # Define output file name
    heatout <- paste("tmp.pafc.heat.",anylim,".h",height,".pdf",sep="");


    # Compute distance and clusteirn function
    d.bcd <- distfunc(dat.bcd)
    fit.bcd <- hclustfunc(d.bcd)


    # Cluster by height
    #cutree and rect.huclust has to be used in tandem
    clusters <- cutree(fit.bcd, h=height) 
    nofclust.height <-  length(unique(as.vector(clusters)));

    myorder <- colnames(dat.bcd); 
    if (length(type.order)>0) {
     myorder <- type.order
    }

    # Define colors
    #hmcols <- rev(brewer.pal(11,"Spectral"));
    hmcols <- rev(redgreen(2750));
    selcol <- colorRampPalette(brewer.pal(12,"Set3"))
    selcol2 <- colorRampPalette(brewer.pal(9,"Set1"))
    sdcol= selcol(5);
    clustcol.height = selcol2(nofclust.height);

    # Plot heatmap
    pdf(file=heatout,width=20,height=50); # for FC.lim >=2
    heatmap.2(as.matrix(dat.bcd[,myorder]),Colv=FALSE,density.info="none",lhei=c(0.1,4),dendrogram="row",scale="row",RowSideColors=clustcol.height[clusters],col=hmcols,trace="none", margin=c(30,row.margins), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0),keysize=0.3);
    dev.off();


}
#--------------------------------------------------
# ENd of functions 
#-------------------------------------------------- 

plot_hclust("http://pastebin.com/raw.php?i=ZaGkPTGm",clust.height=3,row.margins=70);
TWL
  • 2,290
  • 1
  • 17
  • 21
pdubois
  • 7,640
  • 21
  • 70
  • 99
  • What exactly do you mean by _squished_? – jbaums Feb 24 '14 at 09:21
  • Majority of the dendogram, we cannot see clearly the height of the tree. – pdubois Feb 24 '14 at 09:24
  • 2
    Isn't that just because most of the nodes are much more closely related to each other than the bottom-most nodes? To 'unsquish' the 'squished' parts of the dendrogram would seem to require further expanding the already unsquished regions. Otherwise heights are no longer relative. Perhaps you could scale the heights (e.g. `sqrt`), to pull the large values closer to the majority of values, but this might result in a misleading tree. Alternatively, make the whole plot wider. – jbaums Feb 24 '14 at 09:48

3 Answers3

9

In your case the data has long tail, which is expected for gene expression data (lognormal).

data <- read.table(file='http://pastebin.com/raw.php?i=ZaGkPTGm', 
                   header=TRUE, row.names=1)

mat <- as.matrix(data[,-1]) # -1 removes the first column containing gene symbols

As you can see from the quantile distribution that the genes with the highest expression extend the range from 1.5 to above 300.

quantile(mat)

#     0%     25%     50%     75%    100% 
#  0.000   0.769   1.079   1.544 346.230 

When the hierarchical clustering is performed on unscaled data the resulting dendrogram may show bias towards the values with the highest expression, as seen in your example. This merits either a logarithmic or z-score transformation, among many (reference). Your dataset contains values == 0, which is a problem for log-transformation since log(0) is undefined.

Z-score transformation (reference) is implemented within heatmap.2, but it's important to note that the function computes the distance matrix and runs clustering algorithm before scaling the data. Hence the option scale='row' doesn't influence the clustering results, see my earlier post (differences in heatmap/clustering defaults in R) for more details.

I would propose that you scale your data before running heatmap.2:

# scale function transforms columns by default hence the need for transposition.
z <- t(scale(t(mat))) 

quantile(z)

#         0%        25%        50%        75%       100% 
# -2.1843994 -0.6646909 -0.2239677  0.3440102  2.2640027 

# set custom distance and clustering functions
hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) dist(x,method="maximum")

# obtain the clusters
fit <- hclustfunc(distfunc(z))
clusters <- cutree(fit, 5) 

# require(gplots)
pdf(file='heatmap.pdf', height=50, width=10)
heatmap.2(z, trace='none', dendrogram='row', Colv=F, scale='none', 
             hclust=hclustfunc, distfun=distfunc, col=greenred(256), symbreak=T,
             margins=c(10,20), keysize=0.5, labRow=data$Gene.symbol,
             lwid=c(1,0.05,1), lhei=c(0.03,1), lmat=rbind(c(5,0,4),c(3,1,2)),
             RowSideColors=as.character(clusters))
dev.off()

Also, see the additional posts here and here, which explain how to set the layout of the heatmap via lmat, lwid and lhei parameters.

The resulting heatmap is shown below (row and column labels are omitted):

enter image description here

Community
  • 1
  • 1
TWL
  • 2,290
  • 1
  • 17
  • 21
  • 1
    Thanks a million. One last question, the scale function here `z <- t(scale(t(mat)))` is identical with Z-score transformation right? – pdubois Feb 25 '14 at 06:12
  • 2
    @pdubois, you're welcome. Yes, `scale` applies z-score transformation. See this post on [creating z-scores in R](http://stackoverflow.com/questions/6148050/creating-z-scores). Note its usage on input matrix: **row-based** transformation `t(scale(t(mat)))`, or **column-based** transformation `scale(mat)`. Follow this post on [the use of z-scores](http://stats.stackexchange.com/questions/36076/is-a-heat-map-of-gene-expression-more-informative-if-z-scores-are-used-instead-o) in visualising the changes in gene expression. – TWL Feb 25 '14 at 11:02
1

As far as I can tell, you may have some outliers in your data set (the objects at the very bottom). Try the following:

  1. remove outliers from the data set
  2. logscale your distances, to give less emphasis to extreme values
Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
1

Zscore transformation is easy with 'fheatmap' package by using 'scale' parameter. Check out 'fheatmap' package. Height of the dendograms can be expanded by increasing the with of the canvas(pdf). http://cran.r-project.org/web/packages/fheatmap/index.html

advert pub
  • 11
  • 1